The Calculus 
of Variations 



» 

Bruce van Brunt 




Springer 



Universitext 



Editorial Board 
(North America): 

S. Axler 
F.W. Gehring 
K.A. Ribet 



Springer 

New York 

Berlin 

Heidelberg 

Hong Kong 

London 

Milan 

Paris 

Tokyo 




Bruce van Brunt 



The Calculus of 
Variations 



With 24 Figures 




Springer 




Bruce van Brunt 

Institute of Fundamental Sciences 
Palmerston North Campus 
Private Bag 11222 
Massey University 
Palmerston North 5301 
New Zealand 
b.vanbrunt@massey.ac.nz 

Editorial Board 
(North America): 

S. Axler 

Mathematics Department 
San Francisco State University 
San Francisco, CA 94132 
USA 

axler@sfsu.edu 
K.A. Ribet 

Mathematics Department 
University of California, Berkeley 
Berkeley, CA 94720-3840 
USA 

ribet@math.berkeley.edu 



F.W. Gehring 
Mathematics Department 
East Hall 

University of Michigan 
Ann Arbor, MI 48109-1109 
USA 

fgehring@math.lsa.umich.edu 



Mathematics Subject Classification (2000): 34Bxx, 49-01, 70Hxx 



Library of Congress Cataloging-in-Publication Data 
van Brunt, B. (Bruce) 

The calculus of variations / Bruce van Brunt. 

p. cm. — (Universitext) 

Includes bibliographical references and index. 

ISBN 0-3S7-40247-0 (alk. paper) 

1. Calculus of variations. I. Title. 

QA315.V35 2003 

515'.64 — dc21 2003050661 

ISBN 0-387-40247-0 Printed on acid-free paper. 

© 2004 Springer-Verlag New York, Inc. 

All rights reserved. This work may not be translated or copied in whole or in part without the written 
permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, 
USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection 
with any form of information storage and retrieval, electronic adaptation, computer software, or by 
similar or dissimilar methodology now known or hereafter developed is forbidden. 

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they 
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are 
subject to proprietary rights. 

Printed in the United States of America. 

987654321 SPIN 10934548 

Typesetting: Pages created by the author in lATgX 2e using Springer’s svmono.cls macro. 
www.springer-ny.com 

Springer-Verlag New York Berlin Heidelberg 
A member of BertelsmannSpringer Science+Business Media GmbH 




To Anne, Anastasia, and Alexander 




Preface 



The calculus of variations has a long history of interaction with other branches 
of mathematics such as geometry and differential equations, and with physics, 
particularly mechanics. More recently, the calculus of variations has found 
applications in other fields such as economics and electrical engineering. Much 
of the mathematics underlying control theory, for instance, can be regarded 
as part of the calculus of variations. 

This book is an introduction to the calculus of variations for mathemati- 
cians and scientists. The reader interested primarily in mathematics will find 
results of interest in geometry and differential equations. I have paused at 
times to develop the proofs of some of these results, and discuss briefly var- 
ious topics not normally found in an introductory book on this subject such 
as the existence and uniqueness of solutions to boundary-value problems, the 
inverse problem, and Morse theory. I have made “passive use” of functional 
analysis (in particular normed vector spaces) to place certain results in con- 
text and reassure the mathematician that a suitable framework is available 
for a more rigorous study. For the reader interested mainly in techniques and 
applications of the calculus of variations, I leavened the book with numer- 
ous examples mostly from physics. In addition, topics such as Hamilton’s 
Principle, eigenvalue approximations, conservation laws, and nonlrolonomic 
constraints in mechanics are discussed. More importantly, the book is written 
on two levels. The technical details for many of the results can be skipped 
on the initial reading. The student can thus learn the main results in each 
chapter and return as needed to the proofs for a deeper understanding. Sev- 
eral key results in this subject have tractable analogues in finite-dimensional 
optimization. Where possible, the theory is motivated by first reviewing the 
theory for finite-dimensional problems. 

The book can be used for a one-semester course, a shorter course, or in- 
dependent study. The final chapter on the second variation has been written 
with these options in mind, so that the student can proceed directly from 
Chapter 3 to this topic. Throughout the book, asterisks have been used to 
flag material that is not central to a first course. 




VIII Preface 



The target audience for this book is advanced undergraduate/ beginning 
graduate students in mathematics, physics, or engineering. The student is as- 
sumed to have some familiarity with linear ordinary differential equations, 
multivariable calculus, and elementary real analysis. Some of the more theo- 
retical material from these topics that is used throughout the book such as 
the implicit function theorem and Picard’s theorem for differential equations 
has been collected in Appendix A for the convenience of the reader. 

Like many textbooks in mathematics, this book can trace its origins back 
to a set of lecture notes. The transformation from lecture notes to textbook, 
however, is nontrivial, and one is faced with myriad choices that, in part, re- 
flect one’s own interests and experiences teaching the subject. While writing 
this book I kept in mind three quotes spanning a few generations of mathe- 
maticians. The first is from the introduction to a volume of Spivak’s multi- 
volume treatise on differential geometry [64]: 

I feel somewhat like a man who has tried to cleanse the Augean stables 
with a Johnny-Mop. 

It is tempting, when writing a textbook, to give some modicum of complete- 
ness. When faced with the enormity of literature on this subject, however, 
the task proves daunting, and it soon becomes clear that there is just too 
much material for a single volume. In the end, I could not face picking up 
the Johnny-Mop, and my solution to this dilemma was to be savage with 
my choice of topics. Keeping in mind that the goal is to produce a book 
that should serve as a text for a one-semester introductory course, there were 
many painful omissions. Firstly, I have tried to steer a reasonably consistent 
path by keeping the focus on the simplest type problems that illustrate a 
particular aspect of the theory. Secondly, I have opted in most cases for the 
“no frills” version of results if the “full feature” version would take us too 
far afield, or require a substantially more sophisticated mathematical back- 
ground. Topics such as piecewise smooth extremals, fields of extremals, and 
numerical methods arguably belong in any introductory account. Nonethe- 
less, I have omitted these topics in favour of other topics, such as a solution 
method for the Hamilton- Jacobi equation and Noether’s theorem, that are 
accessible to the general mathematically literate undergraduate student but 
often postponed to a second course in the subject. 

The second quote comes from the introduction to Titchmarsh’s book on 
eigenfunction expansions [70]: 

I believe in the future of ‘mathematics for physicists’, but it seems 
desirable that a writer on this subject should understand both physics 
as well as mathematics. 

The words of Titchmarsh remind me that, although I am a mathematician 
interested in the applications of mathematics, I am not a physicist, and it 
is best to leave detailed accounts of physical models in the hands of experts. 
This is not to say that the material presented here lies in some vacuum of pure 




Preface 



IX 



mathematics, where we merely acknowledge that the material has found some 
applications. Indeed, the book is written with a definite slant towards “applied 
mathematics,” but it focuses on no particular held of applied mathematics in 
any depth. Often it is the application not the mathematics that perplexes 
the student, and a study in depth of any particular held would require either 
the student to have the necessary prerequisites or the author to develop the 
subject. The former case restricts the potential audience; the latter case shifts 
away from the main topic. In any event, I have not tried to write a book on 
the calculus of variations with a particular emphasis on one of its many helds 
of applications. There are many splendid books that merge the calculus of 
variations with particular applications such as classical mechanics or control 
theory. Such texts can be read with proht in conjunction with this book. 

The third quote comes from G.H. Hardy, who made the following comment 
about A.R. Forsyth’s 656-page treatise [27] on the calculus of variations : ' 

In this enormous volume, the author never succeeds in proving that 

the shortest distance between two points is a straight line. 

Hardy did not mince words when it came to mathematics. The prospective 
author of any text on the calculus of variations should bear in mind that, 
although there are many mathematical avenues to explore and endless minu- 
tiae to discuss, certain basic questions that can be answered by the calculus 
of variations in an elementary text should be answered. There are certain 
problems such as geodesics in the plane and the catenary that can be solved 
within our self-imposed regime of elementary theory. I do not hesitate to use 
these simple problems as examples. At the same time, I also hope to give the 
reader a glimpse of the power and elegance of a subject that has fascinated 
mathematicians for centuries. 

I wish to acknowlege the help of my former students, whose input shaped 
the final form of this book. I wish also to thank Fiona Davies for helping me 
with the figures. Finally, I would like to acknowledge the help of my colleagues 
at the Institute of Fundamental Sciences, Massey University. 

The earlier drafts of many chapters were written while travelling on vari- 
ous mountaineering expeditions throughout the South Island of New Zealand. 
The hospitality of Clive Marsh and Heather North is gratefully acknowledged 
along with that of Andy Backhouse and Zoe Hart. I should also like to ac- 
knowledge the New Zealand Alpine Club, in whose huts I wrote many early 
(and later) drafts during periods of bad weather. In particular, I would like 
to thank Graham and Eileen Jackson of Unwin Hut for providing a second 
home conducive to writing (and climbing). 

Fox Glacier, New Zealand Bruce van Brunt 

February 2003 

1 F. Smithies reported this comment in an unpublished talk, “Hardy as I Knew 
Him,” given to the British Society for the History of Mathematics, 19 December 
1990. 




Contents 



1 Introduction 1 

1.1 Introduction 1 

1.2 The Catenary and Brachystochrone Problems 3 

1.2.1 The Catenary 3 

1.2.2 Brachystoclrrones 7 

1.3 Hamilton’s Principle 10 

1.4 Some Variational Problems from Geometry 14 

1.4.1 Dido’s Problem 14 

1.4.2 Geodesics 16 

1.4.3 Minimal Surfaces 20 

1.5 Optimal Harvest Strategy 21 

2 The First Variation 23 

2.1 The Finite-Dimensional Case 23 

2.1.1 Functions of One Variable 23 

2.1.2 Functions of Several Variables 26 

2.2 The Euler-Lagrange Equation 28 

2.3 Some Special Cases 36 

2.3.1 Case I: No Explicit y Dependence 36 

2.3.2 Case II: No Explicit x Dependence 38 

2.4 A Degenerate Case 42 

2.5 Invariance of the Euler-Lagrange Equation 44 

2.6 Existence of Solutions to the Boundary- Value Problem* 49 

3 Some Generalizations 55 

3.1 Functionals Containing Higher-Order Derivatives 55 

3.2 Several Dependent Variables 60 

3.3 Two Independent Variables* 65 

3.4 The Inverse Problem* 70 




XII Contents 



4 Isoperimetric Problems 73 

4.1 The Finite-Dimensional Case and Lagrange Multipliers 73 

4.1.1 Single Constraint 73 

4.1.2 Multiple Constraints 77 

4.1.3 Abnormal Problems 79 

4.2 The Isoperimetric Problem 83 

4.3 Some Generalizations on the Isoperimetric Problem 94 

4.3.1 Problems Containing Higher-Order Derivatives 95 

4.3.2 Multiple Isoperimetric Constraints 96 

4.3.3 Several Dependent Variables 99 

5 Applications to Eigenvalue Problems* 103 

5.1 The Sturm-Liouville Problem 103 

5.2 The First Eigenvalue 109 

5.3 Higher Eigenvalues 115 

6 Holonomic and Nonholonomic Constraints 119 

6.1 Holonomic Constraints 119 

6.2 Nonholonomic Constraints 125 

6.3 Nonholonomic Constraints in Mechanics* 131 

7 Problems with Variable Endpoints 135 

7.1 Natural Boundary Conditions 135 

7.2 The General Case 144 

7.3 Transversality Conditions 150 

8 The Hamiltonian Formulation 159 

8.1 The Legendre Transformation 160 

8.2 Hamilton’s Equations 164 

8.3 Symplectic Maps 171 

8.4 The Hamilton- Jacobi Equation 175 

8.4.1 The General Problem 175 

8.4.2 Conservative Systems 181 

8.5 Separation of Variables 184 

8.5.1 The Method of Additive Separation 185 

8.5.2 Conditions for Separable Solutions* 190 

9 Noether’s Theorem 201 

9.1 Conservation Laws 201 

9.2 Variational Symmetries 202 

9.3 Noether’s Theorem 207 

9.4 Finding Variational Symmetries 213 




Contents XIII 



10 The Second Variation 221 

10.1 The Finite-Dimensional Case 221 

10.2 The Second Variation 224 

10.3 The Legendre Condition 227 

10.4 The Jacobi Necessary Condition 232 

10.4.1 A Reformulation of the Second Variation 232 

10.4.2 The Jacobi Accessory Equation 234 

10.4.3 The Jacobi Necessary Condition 237 

10.5 A Sufficient Condition 241 

10.6 More on Conjugate Points 244 

10.6.1 Finding Conjugate Points 245 

10.6.2 A Geometrical Interpretation 249 

10.6.3 Saddle Points* 254 

10.7 Convex Integrands 257 

A Analysis and Differential Equations 261 

A.l Taylor’s Theorem 261 

A. 2 The Implicit Function Theorem 265 

A. 3 Theory of Ordinary Differential Equations 268 

B Function Spaces 273 

B. l Normed Spaces 273 

B.2 Banach and Hilbert Spaces 278 

References 283 



Index 



287 




1 



Introduction 



1.1 Introduction 

The calculus of variations is concerned with finding extrema and, in this sense, 
it can be considered a branch of optimization. The problems and techniques 
in this branch, however, differ markedly from those involving the extrema 
of functions of several variables owing to the nature of the domain on the 
quantity to be optimized. A functional is a mapping from a set of functions 
to the real numbers. The calculus of variations deals with finding extrema 
for functionals as opposed to functions. The candidates in the competition 
for an extremum are thus functions as opposed to vectors in R", and this 
gives the subject a distinct character. The functionals are generally defined 
by definite integrals; the sets of functions are often defined by boundary con- 
ditions and smoothness requirements, which arise in the formulation of the 
problem/model. 

The calculus of variations is nearly as old as the calculus, and the two 
subjects were developed somewhat in parallel. In 1927 Forsyth [27] noted that 
the subject “attracted a rather fickle attention at more or less isolated intervals 
in its growth.” In the eighteenth century, the Bernoulli brothers, Newton, 
Leibniz, Euler, Lagrange, and Legendre contributed to the subject, and their 
work was extended significantly in the next century by Jacobi and Weierstrafi. 
Hilbert [38], in his renowned 1900 lecture to the International Congress of 
Mathematicians, outlined 23 (now famous) problems for mathematicians. His 
23rcl problem is entitled Further development of the methods of the calculus 
of variations. Immediately before describing the problem, he remarks: 

... I should like to close with a general problem, namely with the 
indication of a branch of mathematics repeatedly mentioned in this 
lecture — which, in spite of the considerable advancement lately given 
it by Weierstrafi, does not receive the general appreciation which in 
my opinion it is due — I mean the calculus of variations. 




2 



1 Introduction 



Hilbert’s lecture perhaps struck a chord with mathematicians. 1 In the early 
twentieth century Hilbert, Noether, Tonelli, Lebesgue, and Hadamarcl among 
others made significant contributions to the field. Although by Forsyth’s time 
the subject may have “attracted rather fickle attention,” many of those who 
did pay attention are numbered among the leading mathematicians of the 
last three centuries. The reader is directed to Goldstine [36] for an in-depth 
account of the history of the subject up to the late nineteenth century. 

The enduring interest in the calculus of variations is in part due to its ap- 
plications. Of particular note is the relationship of the subject with classical 
mechanics, where it crosses the boundary from being merely a mathemati- 
cal tool to encompassing a general philosophy. Variational principles abound 
in physics and particularly in mechanics. The application of these principles 
usually entails finding functions that minimize definite integrals (e.g., energy 
integrals) and hence the calculus of variations comes naturally to the fore. 
Hamilton’s Principle in classical mechanics is a prominent example. An earlier 
example is Fermat’s Principle of Minimum Time in geometrical optics. The 
development of the calculus of variations in the eighteenth and nineteenth 
centuries was motivated largely by problems in mechanics. Most textbooks on 
classical mechanics (old and new) discuss the calculus of variations in some 
depth. Conversely, many books on the calculus of variations discuss applica- 
tions to classical mechanics in detail. In the introduction of Caratlreodory’s 
book [21] he states: 

I have never lost sight of the fact that the calculus of variations, as it 
is presented in Part II, should above all be a servant of mechanics. 

Certainly there is an intimate relationship between mechanics and the cal- 
culus of variations, but this should not completely overshadow other fields 
where the calculus of variations also has applications. Aside from applications 
in traditional fields of continuum mechanics and electromagnetism, the calcu- 
lus of variations has found applications in economics, urban planning, and a 
host of other “nontraditional fields.” Indeed, the theory of optimal control is 
centred largely around the calculus of variations. 

Finally it should be noted the calculus of variations does not exist in a 
mathematical vacuum or as a closed chapter of classical analysis. Historically, 
this field has always intersected with geometry and differential equations, 
and continues to do so. In 1974, Stampacchia [17], writing on Hilbert’s 23rd 
problem, summed up the situation: 

One might infer that the interest in this branch of Analysis is weak- 
ening and that the Calculus of Variations is a Chapter of Classical 
Analysis. In fact this inference would be quite wrong since new prob- 
lems like those in control theory are closely related to the problems of 

1 His nineteenth and twentieth problems were also devoted to the calculus of vari- 
ations. 




1.2 The Catenary and Brachystochrone Problems 



3 




the Calculus of Variations while classical theories, like that of bound- 
ary value problems for partial differential equations, have been deeply 
affected by the development of the Calculus of Variations. Moreover, 
the natural development of the Calculus of Variations has produced 
new branches of mathematics which have assumed different aspects 
and appear quite different from the Calculus of Variations. 

The field is far from dead and it continues to attract new researchers. 

In the remainder of this chapter we discuss some typical problems in the 
calculus of variations that are easy to model (although perhaps not so easy 
to solve). These problems illustrate the above comments and give the reader 
a taste of the subject. We return to most of these examples later in the book 
as the mathematics to solve them develops. 



1.2 The Catenary and Brachystochrone Problems 

1.2.1 The Catenary 

Consider a thin heavy uniform flexible cable suspended from the top of two 
poles of height yo and yi spaced a distance d apart (figure 1.1). At the base of 
each pole the cable is assumed to be coiled. The cable follows up the pole to 
the top, runs through a pulley, and then spans the distance d to the next pole. 
The problem is to determine the shape of the cable between the two poles. 

The cable will assume the shape that makes the potential energy minimum. 
The potential energy associated with the vertical parts of the cable will be 
the same for any configuration of the cable and hence we may ignore this 
component. If m denotes the mass per unit length of the cable and g the 
gravitational constant, the potential energy of the cable between the poles is 




4 



1 Introduction 



W p (y) = [ mgy(s) ds, (1.1) 

Jo 

where s denotes arclength, and y(s) denotes the height of the cable above the 
ground s units in length along the cable from the top of the pole at (tco , J/o) • 
The number L denotes the arclength of the cable from (xcnl/o) to (£ 1 , 2 / 1 ). 
Unfortunately, we do not know L in this formulation. We can, however, re- 
cast the above expression for W p in terms of Cartesian coordinates since we 
do know the coordinates of the pole tops. The differential arclength element 
in Cartesian coordinates is given by ds = + y' 2 , and this leads to the 

following expression for W p , 

rx i 

W p (y)= / mgy(x)y/l + y' 2 (x) dx. (1.2) 

J Xq 

Note that unlike our first expression for W p , the above one involves the deriva- 
tive of y. We have implicitly assumed here that the solution curve can be 
represented by a function y : [xo, x±\ — > R and that this function is continuous 
and at least piecewise differentiable. Given the nature of the problem these 
seem reasonable assumptions. 

The cable will assume the shape that minimizes W p . The constant factor 
mg in the expression for W p can be ignored for the purposes of optimizing the 
potential energy. The essence of the problem is thus to determine a function 
y such that the quantity 



J(y)= f y\/l + y' 2 dx (1.3) 

j Xo 

is minimum. The model requires that any candidate y for an extremum sat- 
isfies the boundary conditions 



y(xo) = yo, y(xi) = yi. (1.4) 

In addition, the candidates must also be continuous and at least piecewise 
differentiable in the interval [xq,Xi]. 

We find the extrema for J in Chapter 2, where we show that the shape of 
the cable can be described by a hyperbolic cosine function. The curve itself is 
called a catenary . 2 

The same functional J arises in a problem in geometry concerning a min- 
imal surface of revolution, i.e., a surface of revolution having minimal surface 
area. Suppose that the x-axis corresponds to the axis of rotation. Any surface 
of revolution can be generated by a curve in the xy-plane (figure 1.2). The 

2 The name “catenary” is particularly descriptive. The name comes from the Latin 
word catena meaning chain. Catenary refers to the curve formed by a uniform 
chain hanging freely between two poles. Leibniz is credited with coining the term 
(ca. 1691). 




1.2 The Catenary and Brachystochrone Problems 



5 




problem thus translates to finding the curve 7 that generates the surface of 
revolution having the minimal surface area. As with the catenary problem, we 
make the assumption that 7 can be described by a function y : [xo,Xi] — > R. 
that is continuous and piecewise differentiable in the interval [xo,x\]. Under 
these assumptions we have that the surface area of the corresponding surface 
of revolution is x 

A{y) = 2tt f \y{x)\^Jl + y' 2 {x) dx. (1.5) 

J X 0 

Here we need also make the assumption that y(x) > 0 for all x G [xo, 27]. 3 The 
problem of finding the minimal surface thus reduces to finding the function y 
such that the quantity 



J{y ) = [ yV 1 + y ' 2 dx 

j Xo 

is minimum. The two problems thus produce the same functional to be mini- 
mized. The generating curve that produces the minimal surface of revolution 
is thus a catenary. The surface itself is called a catenoid. 

3 If y = 0 at some point x £ ( xo,xi ) we can still generate a rotationally symmetric 
“object,” but technically it would not be a surface. Near (*, 0, 0) the “object” 
would resemble (i.e., be homeomorphic to) a double cone. The double cone fails 
the requirements to be a surface because any neighbourhood containing the com- 
mon vertex is not homeomorphic to the plane. 




6 



1 Introduction 



Let us return to the original problem. A modification of the problem would 
be to first specify the length of the cable. Evidently, if L is the length of the 
cable we must require that 

L > \/ (ad - x 0 ) 2 + (t/i - y 0 ) 2 

in order that the cable span the two poles. Moreover, it is intuitively clear 
that in the case of equality there is only one configuration possible viz., the 
line segment from (cco>2/o) to (x\,y\). In this case, there is no optimization to 
be done as there is only one candidate. We may thus restrict our attention to 
the case 

L > \/{xi - x 0 ) 2 + ( yi - 2 / 0 ) 2 • 

Given a cable of length L, the problem is to determine the shape the cable 
assumes when supported between the poles. The problem was posed by Jacob 
Bernoulli in 1690. By the end of 1691 the problem was solved by Leibniz, 
Huygens, and Jacob’s younger brother Johann Bernoulli. It should be noted 
that Galileo had earlier considered the problem, but he thought the catenary 
was essentially a parabola. 4 

Since the arclength L of the cable is given, we can use expression (1.1) 
to look for a minimum potential energy configuration. Instead, we start 
with expression (1.2). The modified problem is now to find the function 
■y : [xq,X\] — > R. such that W p is minimized subject to the arclength con- 
straint 

L = I y/l + y' 2 dx, (1.6) 

J X 0 

and the boundary conditions 

y{x 0 ) = yo, y{x\) = yi- 

This problem is thus an example of a constrained variational problem. The 
constraint (1.6) can be regarded as an integral equation (with, it is hoped, 
nonunique solutions). Constraints such as (1.6) are called isoperimetric. We 
discuss problems having isoperimetric constraints in Chapter 4. 

Suppose that we use expression (1.1), which prima facie seems simpler 
than expression (1.2). We know L, so that the limits of the integral are known, 
but the parameter s is special and corresponds to arclength. We must some- 
how build in the requirement that s is arclength if we are to use expression 
(1.1). In order to do this we must use a parametric representation of the curve 
(a;(s), z/(s)) , s € [0,L]. The arclength parameter for such a curve is character- 
ized by the differential equation 

x l2 (s) + y l2 (s) = l. (1.7) 

4 There is still some dispute regarding whether Galileo thought the catenary to be 
the parabola. See Giaquinta and Hildebrandt [32], p. 133 for more details. 




1.2 The Catenary and Brachystochrone Problems 



7 



The problem thus entails finding the functions x(s), y(s) that minimize W p 
subject to the constraint (1.7) and the boundary conditions 

x(0) = Xo, x(L) = xi 
2/(0) = 2/o, y(L)= yi . 

In general, a constraint of this kind is more difficult to deal with than an 
isoperimetric constraint. 

1.2.2 Brachystochrones 

The history of the calculus of variations essentially begins with a problem 
posed by Johann Bernoulli (1696) as a challenge to the mathematical com- 
munity and in particular to his brother Jacob. (There was significant sibling 
rivalry between the two brothers.) The problem is important in the history of 
the calculus of variations because the method developed by Johann’s pupil, 
Euler, to solve this problem provided a sufficiently general framework to solve 
other variational problems. 

The problem that Johann posed was to find the shape of a wire along 
which a bead initially at rest slides under gravity from one end to the other 
in minimal time. The endpoints of the wire are specified and the motion of 
the bead is assumed frictionless. The curve corresponding to the shape of the 
wire is called a brachystochrone 5 or a curve of fastest descent. 

The problem attracted the attention of a number of mathematical luminar- 
ies including Huygens, L’Hopital, Leibniz, and Newton, in addition of course 
to the Bernoulli brothers, and later Euler and Lagrange. This problem was at 
the cutting edge of mathematics at the turn of the eighteenth century. 

Jacob was up to the challenge and solved the problem. Meanwhile (and 
independently) Johann and Leibniz also arrived at correct solutions. Newton 
was late to the party because he learned about the problem some six months 
later than the others. Nonetheless, he solved the problem that same evening 
and sent his solution anonymously the next day to Johann. Newton’s cover 
was blown instantly. Upon looking at the solution, Johann exclaimed “Ah! I 
recognize the paw of the lion.” 

To model Bernoulli’s problem we use Cartesian coordinates with the pos- 
itive y- axis oriented in the direction of the gravitational force (figure 1.3). 
Let (xq, J/o ) and (xi,t/i) denote the coordinates of the initial and final posi- 
tions of the bead, respectively. Here, we require that Xq < X\ and yo < y-\ . 
The Bernoulli problem consists of determining, among the curves that have 
(xq, 2 / 0 ) and (®i)2/i) as endpoints, the curve on which the bead slides down 
from (xo,yo) to (xi,yi) in minimum time. The problem makes sense only for 
continuous curves. We make the additional simplifying (but reasonable) as- 
sumptions that the curve can be represented by a function y : [xo,Xi] — > R. 

5 The word comes from the Greek words brakhistos meaning “shortest” and khronos 
meaning time. 




1 Introduction 



> * 




y y 



Fig. 1.3. 



and that y is at least piecewise differentiable in the interval [xq,Xi]. Now, the 
total time it takes the bead to slide down a curve is given by 




(1.8) 



where L denotes the arclength of the curve, s is the arclength parameter, and 
v is the velocity of the bead s units down the curve from (xo,j/o)- As with 
the catenary problem, we do not know the value of L , so we must seek an 
alternative formulation. 

Our first job is to get an expression for the velocity in terms of the function 
y. We use the law of conservation of energy to achieve this. At any position 
(x,y( x)) on the curve, the sum of the potential and kinetic energies of the 
bead is a constant. Hence 

-mv 2 (x) + mgy(x) = c, (1.9) 



where m is the mass of the bead, v is the velocity of the bead at (x, y(x)), and 
c is a constant. Since the energy is constant along the curve, we know that 

c = ^mv 2 ( x 0 ) + mgy( ab- 
solving equation (1.9) for v gives 



v(x) 




- 2gy(x). 



Equation (1.8) thus implies that 




1.2 The Catenary and Brachystochrone Problems 



9 




Fig. 1.4. 



r Xi \/\ t v'~ 

T(y)= =dx. (1.10) 

Jxo \l^n^ 2 9y(x) 

We thus seek a function y such that T is minimum and 

y{xo) = yo, y(x 1 )=y 1 . 

Note that for the purposes of optimization T can be replaced by the functional 



and the relation 




w(x) 



1 /2c 
2 g \m 




( 1 . 11 ) 



(the y/2g factor does not affect the extrema of J). 

In Chapter 2 we find the extrema for J (and hence T), and show that 
the brachystochrone for this problem is a portion of a special type of curve 
called a cycloid. Figure 1.4 depicts a cycloid. You can visualize a cycloid in 
the safety of your own home by painting a white dot on a clean tyre and then 
rolling the tyre along a line. If you can follow the rolling dot, the curve traced 
out is a cycloid. Before the fabulous Bernoulli brothers came on the stage, 
Christiaan Huygens had already discovered a remarkable property of cycloids. 





10 



1 Introduction 



Christiaan discovered that a bead sliding down a cycloid generated by a circle 
of radius p under gravity reaches the bottom of the cycloid arch after the 
period 7r yj p/ g wherever on the arch the bead starts from rest. This notable 
property of the cycloid earned it the appellation isochrone. The cycloid thus 
sports the names isochrone and brachystochrone. 6 Christiaan used the curve 
to good effect and designed what was then considered a remarkably accurate 
pendulum clock based on the laudable properties of the cycloid, which was 
used to govern the motion of the pendulum. The reader may find a diagram 
of the pendulum and further details on this interesting curve in an article by 
Tee [67] wherein several original references may be found. 

Finally, we note that brachystochrone problems have proliferated in the 
three centuries following Bernoulli’s challenge. Some models subjected the 
bead to a resisting medium whilst others changed the force field from a simple 
uniform gravitational field to more complicated scenarios. Research is still 
progressing on brachystoclrrones. The reader is directed to the work of Tee 
[67], [68], [69] for more references. 



1.3 Hamilton’s Principle 

There are many fine books on classical (analytical) mechanics (e.g., [1], [6], 
[35], [48], [49], [59], and [73]) and we make no attempt here to give even a basic 
account of this seemingly vast subject. Nonetheless, it would be demeaning 
to the calculus of variations to ignore its rich heritage and fruitful interaction 
with classical mechanics. Moreover, many of our examples come from classical 
mechanics, so a few words from our sponsor seem in order. 

Classical mechanics is teeming with variational principles of which Hamil- 
ton’s Principle is perhaps the most important. 7 In this section we give a brief 
“no frills” statement of Hamilton’s Principle as it applies to the motion of 
particles. The serious student of mechanics should consult one of the many 
specialized texts on this subject. 

Let us first consider the motion of a single particle in R 3 . Let r(t) = 
(a :(t),y(t),z(t)) denote the position of the particle at time t. The kinetic 
energy of this particle is given by 

T = (x 2 (t.) + y 2 {t) + z 2 {t)) , 

where m is the mass of the particle and ' denotes d/dt. We assume that the 
forces on the particle can be derived from a single scalar function. Specifically, 
we assume there is a function V such that: 

6 It is also called a tautochrone, but we do not count this since the word is derived 
from the Greek word tauto meaning “same.” The prefix iso comes from the Greek 
word isos, which also means “same.” 

7 One need only scan through Lanczos’ book [48] to find the “Principle of Vir- 
tual Work,” “D’Alembert’s Principle,” “Gauss’ Principle of Least Constraint,” 
“Jacobi’s Principle,” and, of course, “Hamilton’s Principle” among others. 




1.3 Hamilton’s Principle 



11 



1. V depends only on time and position; i.e., V = V(t,x,y, z); 

2. the force f = (/i ■ f' 2 - f‘i) acting on the particle has the components 

, _ dV , _ dV dV 

J 1 o’ «/2 -> J 3 o 

ox oy oz 

The function V is called the potential energy. Let 

L = T — V. 

The function L is called the Lagrangian. Suppose that the initial position of 
the particle r(to) and final position r(fi) are specified. Hamilton’s Principle 
states that the path of the particle r(t) in the time interval [to, ti] is such that 
the functional 

J(r) = / L(t,r,r)dt 
Jt o 

is stationary, i.e., a local extremum or a “saddle point.” (We define “station- 
ary” more precisely in Section 2.2.) In the lingo of mechanics J is called the 
action integral or simply the action. 

Problems in mechanics often involve several particles (or spatial coordinates); 
moreover, Cartesian coordinates are not always the best choice. Variational 
principles are thus usually given in terms of generalized coordinates. 
The letter q has been universally adopted to denote generalized position 
coordinates. The configuration of a system at time t is thus denoted by 
q (t) = (q ± (<),..., q n (t)), where the q k are position variables. If, for exam- 
ple, the system consists of three free particles in R 3 then n = 9. 

The kinetic energy T of a system is given by a quadratic form in the 
generalized velocities qk, 



1 . , 

T ( q,q) = 2 C o^i.o L )q j q k . 

j,k = l 

Assuming the system has a potential energy function V(t, q), the Lagrangian 
is given by 

L{t, q,q) = T(q,q) -V(t, q). 

In this framework Hamilton’s Principle takes the following form. 

Theorem 1.3.1 (Hamilton’s Principle) The motion of a system of parti- 
cles q(t) from a given initial configuration q(fo) to a given final configuration 
q(fi) in the time interval [to,ii] is such that the functional 

[ L(t, q,q) dt 
J to 



is stationary. 




12 



1 Introduction 




Fig. 1.5. 



The dynamics of a system of particles is thus completely contained in the 
single scalar function L. We can derive the familiar equations of motion from 
Hamilton’s Principle (cf. Section 3.2). The reader might rightfully question 
whether the motion predicted by Hamilton’s Principle depends on the choice 
of coordinates. The variational approach would surely be of limited value were 
it sensitive to the observer’s choice of coordinates. We show in Section 2.5 that 
Hamilton’s Principle produces equations that are necessarily invariant with 
respect to coordinate choices. 

Example 1.3.1: Simple Pendulum 

Consider a simple pendulum of mass m and length £ in the plane. Let 
(. x(t),y(t )) denote the position of the mass at time t. Since x 2 + y 2 = t 2 
we need in fact only one position variable. Rather than use x or y it is natural 
to use polar coordinates and characterize the position of the mass at time t 
by the angle <p(t) between the vertical and the string to which the mass is 
attached (figure 1.5). Now, the kinetic energy is 

T = * m(x 2 (t ) + y 2 (t)) = *m.£ 2 (f> 2 (t), 

and the potential energy is 

V = mgh = mg£( 1 — cos 4>{t)), 

where g is a gravitation constant. Thus, 

L((f>, <j>) = ^ m£ 2 q \ 2 — mg£( 1 — cos </>), 

and Hamilton’s Principle implies that the motion from a given initial angle 
4>(to) to a fixed angle <j>(ti) is such that the functional 

J(4>) = f (-m£ 2 <j) 2 — mg£( 1 — cos^)^ dt 



is stationary. 




1.3 Hamilton’s Principle 



13 



Example 1.3.2: Kepler problem 

The Kepler problem models planetary motion. It is one of the most heavily 
studied problems in classical mechanics. Keeping with our no frills approach, 
we consider the simplest problem of a single planet orbiting around the sun, 
and ignore the rest of the solar system. Assuming the sun is fixed at the origin, 
the kinetic energy of the planet is 

T = i m{x 2 (t ) + y 2 (t)) = (r 2 {t) + r 2 {t)8 2 {t )) , 



where r and 6 denote polar coordinates and m is the mass of the planet. 
We can deduce the potential energy function V from the gravitational law of 
attraction 

GmM 
J ^2 ’ 

where / is the force (acting in the radial direction), M is the mass of the sun, 
and G is the universal gravitation constant. Given that 



/ = 



dV 

W’ 



we have 



hence, 



V(r) = - J f(r) dr = — 



GmM 



L(r , 6) = ( ' r 2 + r 2 0 2 ^j + 



GmM 



Hamilton’s Principle implies that the motion of the planet from 
observation (r(t 0 ),9(to)) to a final observation (r(ti), is such 



r* 1 n (. 2 2 a 2 \ 


GmM \ 


/ 7: m I ' + r " 


1 + ) 


It o V2 V J 


r ) 



dt 



an initial 
that 



is stationary. 



The reader may be wondering about the fate of the constant of integration 
in the last example. Any potential energy of the form — GmM /r + const, will 
produce the requisite force /. In the pendulum problem we tacitly assumed 
that the potential energy was proportional to the height of the mass above the 
minimum possible height. In fact, for the purposes of describing the dynamics 
it does not matter; i.e. , V(t, q) and V(t, q) + ci produce the same results for 
any constant ci. We are optimizing J and the addition of a constant in the 
Lagrangian simply alters the functional J(q) to J(q) = J(q) + const. If one 
functional is stationary at q the other must also be stationary at q. 

In the lore of classical mechanics there is another variational principle 
that is sometimes called the “Principle of Least Action” or “Maupertuis’ 




14 



1 Introduction 



Principle,” which predates Hamilton’s Principle. This principle is sometimes 
confused with Hamilton’s and the situation is not mitigated by the fact that 
Hamilton’s Principle is sometimes called the Principle of Least Action. 8 Mau- 
pertuis’ Principle concerns systems that are conservative. In a conservative 
system we have that the total energy of the system at any time t along the 
path of motion is constant. In other words, L + V = k, where k is a con- 
stant. For this special case L = 2 T — k, and Hamilton’s Principle leads to 
Maupertuis’ Principle that the functional 

K{ q) = f T(q,q)di 

j to 

is stationary along a path of motion. Hence, Maupertuis’ Principle is a special 
case of Hamilton’s Principle. Most books on classical mechanics discuss these 
principles (along with others). Lanczos [48] gives a particularly complete and 
readable account that, in addition to mechanics, deals with the history and 
philosophy of these principles. The eminent scientist E. Mach [51] also writes 
at length about the history, significance, and philosophy underlying these 
principles. His perspective and sympathies are somewhat different from those 
of Lanczos. 9 



1.4 Some Variational Problems from Geometry 

1.4.1 Dido’s Problem 

Dido was a Carthaginian queen (ca. 850 B.C.?) who came from a dysfunctional 
family. Her brother, Pygmalion, murdered her husband (who was also her 
uncle) and Dido, with the help of various gods, fled to the shores of North 
Africa with Pygmalion in pursuit. Upon landing in North Africa, legend has it 
that she struck a deal with a local chief to procure as much land as an oxhide 
could contain. She then selected an ox and cut its hide into very narrow strips, 
which she joined together to form a thread of oxhide more than two and a half 
miles long. Dido then used the oxhide thread and the North African sea coast 
to define the perimeter of her property. It is not clear what the immediate 
reaction of the chief was to this particular interpretation of the deal, but it is 

8 The translators of Landau and Lifshitz [49], p. 131, go so far as to draft a table 
to elucidate the different usages. 

9 Mach is not so generous with Maupertuis. In connexion with Maupertuis’ Prin- 
ciple he writes, “It appears that Maupertuis reached this obscure expression by 
an unclear mingling of his ideas of vis viva and the principle of virtual velocities” 
(p. 365). In defense of Mach, we must note that Maupertuis suffered no lack of 
critics even in his own day. Voltaire wrote the satire Histoire du docteur Akakia et 
du naif de Saint Malo about Maupertuis. The situation at Frederick the Great’s 
court regarding Maupertuis, Konig, and Voltaire is the stuff of soap operas (see 
Pars [59] p. 634). 




1.4 Some Variational Problems from Geometry 



15 



clear that Dido sought to enclose the maximum area within her ox and the 
sea. The city of Carthage was then built within the perimeter defined by the 
thread and the sea coast. Dido called the place Byrsa meaning hide of bull . 10 

The problem that Dido faced on the shores of North Africa (aside from 
family difficulties) was to determine the optimal path along which to place 
the oxhide thread so as to provide Byrsa with the maximum amount of land. 
Dido did not have the luxury of waiting some 2500 years for the calculus of 
variations to develop and thus settled for an “intuitive solution.” 

Dido’s problem entailed determining the curve 7 of fixed length (the 
thread) such that the area enclosed by 7 and a given curve <7 (the North 
African shoreline) is maximum. Although this is perhaps the original version 
of Dido’s problem, the term has been used to cover the more basic problem: 
among all closed curves in the plane of perimeter L determine the curve that 
encloses the maximum area. The problem did not escape the attention of an- 
cient mathematicians, and as early as perhaps 200 B.C. the mathematician 
Zenodorus 11 is credited with a proof that the solution is a circle. Unfortu- 
nately, there were some technical loopholes in Zenodorus’ proof (he compared 
the area of a circle with that of polygons having the same perimeter). The 
first complete proof of this result was given some 2000 years later by Karl 
WeierstraB in his Berlin lectures. 

Prior to WeierstraB, Steiner (ca. 1841) proved that if there exists a “fig- 
ure” 7 whose area is never less than that of any other “figure” of the same 
perimeter, then 7 is a circle. Not content with one proof, Steiner gave five 
proofs of this result. The proofs are based on simple geometric considerations 
(no calculus of variations). The operative word in the statement of his result, 
however, is “if.” Steiner’s contemporary, Dirichlet, pointed out that his proofs 
do not actually establish the existence of such a figure. WeierstraB and his fol- 
lowers resolved these subtle aspects of the problem. A lively account of Dido’s 
problem and the first of Steiner’s proofs can be found in Korner [45]. 

Some simple geometrical arguments can be used to show that if 7 is a 
simple closed curve solution to Dido’s problem then 7 is convex (cf. Korner, 
op. cit.). This means that a chord joining any two points on 7 lies within 7 

10 The reader will find various bits and pieces of Dido’s history scattered in Latin 
works by authors such as Justin and Virgil. One account of the hide story comes 
from the Aeneid, Bk. I, vs. 367. The story gets even better once Aeneas arrives on 
the scene. Finally, good ideas never die. It is said that the Anglo-Saxon chieftains 
Hengist and Horsa (ca. 449 A.D.) acquired their land by circling it with oxhide 
strips [37]. Beware of real estate transactions that involve an ox. 

11 The proof may have been known even earlier, but Zenodorus in any event is 
the author of the proof that appears in the commentary of Theon to Ptolemy’s 
Almagest. Zenodorus quotes Archimedes (who died in 212 B.C.) and is quoted 
by Pappus (ca. 340 A.D.). Aside from these rough dates we do not know exactly 
when Zenodorus lived. At any rate, the solution was of little comfort to Dido’s 
heirs as the Romans obliterated Carthage/ Byrsa in the third Punic war just after 
200 B.C. and sowed salt on the scorched ground so that nothing would grow. 




16 



1 Introduction 



and the area enclosed by 7 . The convexity of 7 is then used to show that 
Dido’s problem can be distilled down to the problem of finding a function 
■y : [xq,Xi\ — > R. such that 



My) = / y(x) dx 

j Xo 

is maximum subject to the constraint that the arclength of the curve 7 + 
described by y is L/2. If we assume that y is at least piecewise differentiable 
then this amounts to the condition 



T fX 1 

- = / \/l + y ' 2 dx. 

The problem with this formulation is that we do not know the limits of the 
integral. The geometrical character of the problem indicates that we do not 
need to know both Xo and x\ (we could always normalize the construction so 
that Xo = 0 < &i), but we do need to know X\ —xq. This problem is effectively 
the opposite of the problem we had with the first formulation of the catenary. 
Since we know arclength, a natural formulation to use would be one in terms 
of arclength. 

Suppose that 7 + is described parametrically by (x(s),y(s)), s € [0, L/2], 
where s is arclength. Suppose further that x and y are at least piecewise 
differentiable. Green’s theorem in the plane can then be used to show that 
the area of the set enclosed by 7 + and the s-axis is 



A ( y ) = \J o 2/(s) \A ~ y' 2 (s) ds , (1.12) 

where we have used the relation x ,2 (s ) + y ,2 (s ) = 1 . The basic Dido problem 
is thus to determine a positive function y : [0, L/2] — > R. such that A is 
maximum. 



1.4.2 Geodesics 

Let £ be a surface, and let po, Pi be two distinct points on £. The geodesic 
problem concerns finding the curve(s) on £ with endpoints po, p\ for which 
the arclength is minimum. A curve having this property is called a geodesic. 
The theory of geodesics is one of the most developed subjects in differential 
geometry. The general theory is complicated analytically by the situation that 
simple, common surfaces such as the sphere require more than one vector 
function to describe them completely. In the language of geometry, the sphere 
is a manifold that requires at least two charts. We have encountered and side- 
stepped the analogous problem for curves, and we do so here in the interest of 
simplicity. We focus on the local problem and refer the reader to any general 




1.4 Some Variational Problems from Geometry 



17 



text on differential geometry such as Stoker [66] or Willmore [75] for a more 
precise and in-depth treatment of geodesics. 12 

Suppose that E is described by the position vector function r : cr — > R 3 , 
where cr is a nonempty connected open subset of R 2 , and for (it, v) £ cr, 

r (u, v) = ( x(u , v),y(u, v),z(u, v)) . 



We assume that r is a smooth function on cr; i.e., x,y , and z are smooth 
functions of ( u,v ), and that 



dr dr 



(1.13) 



so that r is a one-to-one mapping of cr onto E. If 7 is a curve on E, then 
there is a curve 7 in cr that maps to 7 under r. Any curve on E may thus 
be regarded as a curve in cr. Suppose that the points po and p\ correspond 
to ro = r(ito,uo) and 17 = r(«i,ui), respectively. Any curve 7 from ro to 17 
maps to a curve 7 from (uq,v 0) to (1(1,17). 

For the geodesic problem we restrict our attention to smooth simple curves 
(no self-intersections) on E from ro to ri. Let r denote the set of all such 
curves. Thus, if 7 £ r, then there exists a parametrization of 7 of the form 



R(t) = r(u(t),v(t)), t£[t 0 ,ti], (1.14) 



where R-(to) = r o, R(ti) = ri, and u and v are smooth functions on the 
interval [to, ^i] such that 

u' 2 (t) + v' 2 (t) ^ 0 (1.15) 

for all t £ [io,ti]. In the parameter space cr, the last condition ensures that 
the curve 7 is also a smooth curve and has a well-defined unit tangent vector. 
The differential of arclength along 7 is given by 



ds 2 



|R'(t )| 2 dt 2 



Or 

du 



u'(t) + 




2 

dt 2 



(Eu' 2 + 2Fu'v' + Gu' 2 ) dt 2 , 



where 

dr dr dr 2 

du dv ’ dv 

The functions E, F, and G are called components of the first fundamental 
form or metric tensor. Note that these components depend only on u and 
v. Note also that the identity 




dr dr 



= EG — F 2 



12 



A more specialized discussion can be found in Postnikov [62]. 




18 



1 Introduction 



and condition (1.13) indicate that the quadratic form 

I = Eu ' 2 + 2 Fu'v' + Gv ' 2 



is positive definite. 

The arclength of 7 is given by 



L{ 7) = / \/~E% 



't 0 



12 + 2 Fu'v' + Gv ' 2 dt. 



The geodesic problem is thus to find the functions u and v (i.e. , the curve 7) 
such that L is a minimum and 



u(t 0 ) = uo, v( t 0 ) = u 0 
u(ti) = ui, v(h) = Vi. 

Example 1.4.1: Geodesics on a Sphere 

Let £ be an octant of the unit sphere. The surface £ can be described para- 
metrically by 

r(u, v) = (sin u cos v, sin u sin v, cos u) 
for (j= {(u,v) : 0 <u< 7r/2, 0 < v < 7r/2}. Now, 

2 

= | (cos u cos v, cos u sin v, — sin u) \ 2 

dr 

du dv 

= (cos u cos v, cos u sin v, — sin u) • (— sin u sin v, sin u cos v, 0) 

= 0, 

2 

= | (— sin u sin v, sin u cos v, 0 ) | 2 

= sin 2 u. 

The arclength integral is thus 

L{ 7) = f \/ u' 2 + v' 2 sin 2 u dt. 

Jto 




E = 



dr 

du 



= 1 , 



P — 



dr 



A feature of the basic geodesic problem described above is that it does 
not involve the function r directly. The arclength of a curve depends only on 
the three scalar functions E, F, and G. Geodesics are part of the intrinsic 
geometry of the surface, i.e., the geometry defined by the metric tensor. The 
metric tensor does not define a surface uniquely even modulo translations and 




1.4 Some Variational Problems from Geometry 



19 



rotations. There are any number of distinct surfaces in R 3 that have the same 
metric tensor. For example, a plane, a cone, and a cylinder all have the same 
metric tensor. If a cylinder is “unrolled” and “flattened” to form a portion of 
the plane, then a geodesic on the cylinder would become a geodesic on the 
plane. 

One direction for a generalization of the above problem is to focus on the 
space <7 C R 2 and define the components of the metric tensor. For notational 
simplicity, let u = u 1 , v = it, and u = (u,v). We can choose scalar functions 
9jk : cr — » R j, fc = 1, 2 and define the arclength element ds by 

ds 2 = gn (du 1 ) 2 + g^difidu 2 + g 2 idu 2 du l + g 22 (du 2 ) 2 
= gjkdu-’ du k , 

where the last expression uses the Einstein summation convention: summation 
of repeated indices when one is a superscript and the other is a subscript. Of 
course we must place some restrictions on the gjk in order to ensure that our 
arclength element is positive and that the length of a curve does not depend 
on the choice of coordinates u. We can take care of these concerns by requiring 
that the gjk produce a quadratic form that is positive definite and that the 
gjk form a second order covariant tensor. To mimic the earlier case we also 
impose the symmetry condition 



9jk — gkj ? 

so that 

ds 2 = gu(du 1 ) 2 + 2gi 2 du 1 du 2 + g 22 (du 2 ) 2 . (1.16) 

In terms of the former notation, E = g n, F = g i 2 = c/21, and G = g 22 . For 
this case, the positive definite requirement amounts to the condition 

9ll922 ~ fj'ri > 0 

with gn > 0. The condition that the gjk form a second-order covariant tensor 
means that under a smooth coordinate transformation from u = (v},u 2 ) to 
u = (u 1 ,u 2 ), the components gjk( u) transform to gi m ( u) according to the 
relation 

„ du^ du k 

9lm = g jkW g^. 

The set a equipped with such a tensor can be viewed as defining a geometrical 
object in itself (as the surface E was). It is a special case of what is called a 
Riemannian manifold. Let A4 denote this geometrical object. A curve 7 in 
cj generates a curve 7 in M . , and the arclength is given by 

L{i) = [ J g jk vd'u k ' dt, 

Jt 0 

where (u 1 (t) , u 2 (t)) , t £ [foi^i] is a parametrization of 7 . The condition that 
the gjk form a second-order covariant tensor ensures that L{ 7 ) is invariant 




20 



1 Introduction 



with respect to changes in the curvilinear coordinates u used to represent 
A 4 . Note also that L( 7) is invariant with respect to orientation-preserving 
parametrizations of 7. 

The advantage of the above abstraction is that it can be readily modified to 
accommodate higher dimensions. Suppose that er C R n and u = (u 1 , . . . , u n ). 
We can define an n-dimensional (Riemannian) manifold A 4 by introducing a 
metric tensor with components gjk such that: 

1 . the quadratic form gjkduidu k is positive definite; 

2 - 9jk = 9kj for j, k = 1, 2 , . . . , n; 

3. under any smooth transformation u = u(u) the gjk transform to gi m 
according to the relation 

_ did du k 
9lm = 9jk W Q^. 



A curve 7 on M is generated by a curve 7 in a C R 11 . Suppose that u(f) = 
(u 1 (t), . . . ,u n (t)), t € [to,ti] is a parametrization of 7. The arclength of 7 is 
then defined as 

£(7) = / J gjkU^u kl dt. 

Jto 

A generalization of the geodesic problem is thus to find the curve(s) 7 in a 
with specified endpoints Uo = u(fo), Ui = u(ti) such that L( 7) is a minimum. 

Geodesics are of interest not only in differential geometry, but also in 
mathematical physics and other subjects. It turns out that many problems 
can be interpreted as geodesic problems on a suitably defined manifold. 13 In 
this regard, the geodesic problem is even more important because it provides 
a unifying framework for many problems. 

1.4.3 Minimal Surfaces 

We have already encountered a special minimal surface problem in our dis- 
cussion of the catenary. The rotational symmetry of the problem reduced the 
problem to that of finding a function y of a single variable x, the graph of 
which generates the surface of revolution having minimal surface area. Locally, 
any surface can be represented in “graphical” form, 

r(x, y ) = (x, y , z(x, y)), (1.17) 

where r is the position function in R 3 . Unless some symmetry condition is 
imposed, a surface parametrization requires two independent variables. Thus 
the problem of finding a surface with minimal surface area involves two inde- 
pendent variables in contrast to the problems discussed earlier. 

13 In the theory of relativity, where differential geometry is widely used, the condi- 
tion that the metric tensor be positive definite is relaxed to positive semidefinite. 




1.5 Optimal Harvest Strategy 



21 



Given a simple closed space curve 7, the basic minimal surface problem 
entails finding, among all smooth simply connected surfaces with 7 as a bound- 
ary, the surface having minimal surface area. Suppose that the curve 7 can be 
represented parametrically by (a :(t),y(t),z(t)) for t G [to,G], and for simplic- 
ity suppose that the projection of 7 on the cry-plane is also a simple closed 
curve; i.e., the curve 7 described by (x(t), y(t)) for t € [to, ti] is a simple closed 
curve in the cry-plane. Let 17 denote the region in the cry-plane enclosed by 7. 
Suppose further that we restrict the class of surfaces under consideration to 
those that can be represented in the form (1.17), where 2: is a smooth function 
for (cr, y) G 17. The differential area element is given by 




and the surface area is thus 




The (simplified) minimal surface problem thus concerns determining a smooth 
function 2 : 17 — > R. such that z(x(t), y(t)) = z(t) for t G [fo,G], and A(z) is a 
minimum. There is a substantial body of information about minimal surfaces. 
The reader can find an overview of the subject in Osserman [58]. 



1.5 Optimal Harvest Strategy 

Our final example in this chapter concerns a problem in economics dealing 
with finding a harvest strategy that maximizes profit. Here, we follow the 
example given by Wan [71], p. 6 and use a fishery to illustrate the model. 

Let y(t) denote the total tonnage of fish at time t in a region 17 of the 
ocean, and let y c denote the carrying capacity of the region 17 for the fish. 
The growth of the fish population without any harvesting is typically modelled 
by a first-order differential equation 

y'(t) = f(t,y). (1.18) 

If y is small compared to y c , then / is often approximated by a linear function 
in y; i.e., f(t,y) = ky + g(t), where k is a constant. More complicated models 
are available for a wider range of y{t) such as logistic growth 

f(t,y ) = ky{t) ^1 - . 

The ordinary differential equation (1.18) is accompanied by an initial condi- 
tion 




22 



1 Introduction 



2/(0) = 2/0 (1-19) 

that reflects the initial fish population. 

Suppose now that the fish are harvested at a rate w(t). Equation (1.18) 
for the population growth can then be modified to the relation 

y'(t) = -w(t). (1.20) 

Given the function /, the problem is to determine the function w so that the 
profit in a given time interval T is maximum. 

It is reasonable to expect that the cost of harvesting the fish depends on 
the season, the fish population, and the harvest rate. Let c(t,y,w) denote 
the cost to harvest a unit of fish biomass. Suppose that the fish commands a 
price p per unit fish biomass and that the price is perhaps season dependent, 
but not dependent on the volume of fish on the market. The profit gained by 
harvesting the fish in a small time increment is (p(t) — c(t , y, w))w(t) dt. Given 
a fixed period T with which to plan the strategy, the total profit is thus 

P(y, w) = [ ( P(t ) - c(t, y, w))w(t) dt. 

Jo 

The problem is to identify the function w so that P is maximum. 

The above problem is an example of a constrained variational problem. The 
functional P is optimized subject to the constraint defined by the differential 
equation (1.20) (a nonholonomic constraint) and initial condition (1.19). We 
can convert the problem into an unconstrained one by simply eliminating 
w from the integrand defining P using equation (1.20). The problem then 
becomes the determination of a function y that maximizes the total profit. 
This approach is not necessarily desirable because we want to keep track of 
w, the only physical quantity we can regulate. 

A feature of this problem that distinguishes it from earlier problems is the 
absence of a boundary condition for the fish population at time T. Although 
we are given the initial fish population, it is not necessarily desirable to specify 
the final fish population after time T . As Wan points out, the condition y(T) — 
0, for example, is not always the best strategy: “green issues” aside, it may cost 
far more to harvest the last few fish than they are worth. This simple model 
thus provides an example of a variational problem with only one endpoint 
fixed in contrast to the catenary and brachystochrone. 

In passing we note that economic models such as this one are generally 
framed in terms of “present value.” A pound sterling invested earns interest, 
and this should be incorporated into the overall profit. If the interest is com- 
pounded continuously at a rate r, then a pound invested yields e rt pounds 
after time t. Another way of looking at this is to view a pound of income at 
time t, as worth e~ rt pounds now. Considerations of this sort lead to profit 
functionals of the form 

P(y, w)= ( e~ rt (p(t) - c(t , y, w))w(t) dt. 

Jo 




2 



The First Variation 



In this chapter we develop a necessary condition for a function to yield an 
extremum for a functional. The centrepiece of the chapter is a second-order 
differential equation, the Euler-Lagrange equation, which plays a role analo- 
gous to the gradient of a function. We first motivate the analysis by reviewing 
necessary conditions for functions to have local extrema. The Euler-Lagrange 
equations are derived in Section 2.2 and some special cases where the differ- 
ential equation can be simplified are discussed in Section 2.3. The remaining 
three sections are devoted to more qualitative topics concerning degenerate 
cases, invariance, and existence of solutions. We postpone a discussion of suf- 
ficient conditions until Chapter 10. 



2.1 The Finite-Dimensional Case 

The theory underlying the necessary conditions for extrema in the calculus 
of variations is motivated by that for functions of n independent variables. 
Problems in the calculus of variations are inherently infinite-dimensional. The 
character of the analytical tools needed to solve infinite-dimensional problems 
differs from that required for finite-dimensional problems, but many of the 
underlying ideas have tractable analogues in finite dimensions. In this section 
we review a necessary condition for a function of n independent variables to 
have a local extremum. 

2.1.1 Functions of One Variable 

Let / be a real-valued function defined on the interval / CM. The function 
/ : / — > M is said to have a local maximum at x £ I if there exists a number 
e > 0 such that for any x £ (x — e, x + e) C I, f(x ) < f(x). The function 
/ : / — > M is said to have a local minimum at a: £ I if — / has a local 
maximum at x. A function may have several local extrema in a given interval. 




24 



2 The First Variation 



It may be that a function attains a maximum or minimum value for the 
entire interval. The function / : / — ■> R. has a global maximum on I at 
x £ I if f(x ) < f(x) for all x £ I. The function / is said to have a global 
minimum on I at x £ I if — / has a global maximum at x. Note that if I 
has boundary points then / may have a global maximum on the boundary. If 
/ is differentiable on I then the presence of local maxima or minima on / is 
characterized by the first derivative. 

Theorem 2.1.1 Let f be a real-valued function differentiable on the open 
interval I. If f has a local extremum at x € I then f'(x) = 0. 

Proof: The proof of this result is essentially the same for a local maximum 
or minimum. Suppose that a: is a local maximum. Then there is a number 
e > 0 such that for any x £ (x — e, x + e) Cl the inequality f(x) > f(x) is 
satisfied. Now the derivative of / at x is given by 

/' (x) = lim (f(x) - f(x))/(x- x). 

X — >X 

The numerator of this limit is never positive since f{x) is a maximum, but 
the denominator is positive when x > x and negative when x < x. Since the 
function / is differentiable at x the right- and left-sided limits exist and are 
equal. The only way this can be true is if f'(x) =0. □ 

It is illuminating to examine the situation for smooth functions. We use 
the generic term “smooth” to indicate that the function has as many continu- 
ous derivatives as are necessary to perform whatever operations are required. 
Suppose that / is smooth in the interval (x — e,x + e), where e > 0. Let 
x — x = erj. Taylor’s theorem indicates that, for e sufficiently small, / can be 
represented by 

f(x) = f(x) + a)f'(x) + ^rfffx) + 0(e 3 ). (2.1) 

If f'(x) yf 0 and e is small, the sign of f(x) — f(x) is determined by rjf'(x). 
Suppose that f'(x) ^ 0. If / has a local extremum at x then the sign of 
/(£’) — f(x) cannot change in [x — e, x + e), so that r)f{ x) must have the same 
sign for all ?y. But it is clear that ?y can be positive or negative and hence 
r]f'(x ) can be positive or negative. We must therefore have that f'(x) = 0. If 
f'(x) = 0, then the above expansion indicates that the sign of the difference is 
that of the quadratic term, i.e. , the sign of f"(x). If this derivative is negative 
then f{x) is a local maximum; if it is positive then f{x) is a local minimum. 
It may be that f"(x) = 0. In this case the sign of the difference depends on 
the cubic term, which contains a factor p 3 . Like the linear term, however, this 
factor can be either positive or negative depending on the choice of ?y. Thus, if 
f"'{x) 0, f{x) cannot be a local extremum. We can continue in this manner 

as long as / has the requisite derivatives in (a: — e, x + e). 

For a differentiable function it is easy to see graphically why the condition 
f'(x) = 0 is necessary for a local extremum. The Taylor expansion for a 




2.1 The Finite-Dimensional Case 



25 



smooth function indicates that at any point x at which the first derivative 
vanishes an O(e) change in the independent variable produces an 0(e 2 ) change 
in the function value as e — > 0. For this reason points such as x are called 
stationary points. The functions / n (x) = x n , where n £ N, x £ R provide 
simple paradigms for the various possibilities 

Example 2.1.1: Let f(x) = 3x 2 — x 3 . The function / is smooth for x £ 

R and therefore if any local extrema exist they must satisfy the equation 
6x — 3ar = 0. This equation is satisfied if x = 0 or x = 2. The second derivative 
is 6 — 6x, so that /"( 0) = 6 and consequently /( 0) is a local minimum. On 
the other hand, f"( 2) = —6 and thus /( 2) is a local maximum. 



Example 2.1.2: Let 

,, , f x 2 sin 2 (l/x), if x yf 0 

/(l) = fo, if* = 0. 

This function is differentiable for all x € R. Now /'( 0) = 0, and thus x = 0 
is a stationary point but the derivative is not continuous there and so /"( 0) 
does not exist. We can deduce that / has a local minimum at x = 0 because 
f(x) > 0 for all xeR. 



Example 2.1.3: Let f(x) = \x\. This function is differentiable for all x £ 

R — {0}. The derivative is given by f{x) = —1 for x < 0, and f'(x) = 1 for 
x > 0. Thus / cannot have a local extremum in R\ {0}. Nonetheless it is clear 
that /( 0) = 0 is a local (and global) minimum for / in R. 



Example 2.1.4: Let f(x) = e x . This function is smooth for all x £ R and 

its derivative never vanishes; consequently, / does not have any local extrema. 



The relationship between local and global extrema is limited. Certainly if 
/ has a global extremum at some interior point x of an interval then f(x) 
is also a local extremum. If, in addition, / is differentiable in I, then it must 
also satisfy the condition f'(x ) = 0. But it may be (as often is the case) that 
a global extremum is attained at one of the boundary points of /, in which 
case even if / is differentiable nothing regarding the value of the derivative 
can be asserted. 




26 



2 The First Variation 



2.1.2 Functions of Several Variables 



The definitions for local and global extrema in n dimensions are formally the 
same as for the one-variable case. Let fl C R” be a region and suppose that 
/ : fl — y R. For e > 0 and x = (aq, x 2 , ■ ■ ■ , x n ), let 

-B(x; e) = {x G R" : \x\ — X\\ 2 + |x 2 - x 2 \ 2 + ■ • • \x n - x n \ 2 < e 2 }. 



The function / : 12 — > R has a global maximum (global minimum) on 
fl at x G fl if /(x) < /(x) (/(x) > /(x)) for all x G fl. The function / 
has a local maximum (local minimum) at x € fl if there exists a number 
e > 0 such that for any x G _B(x;e) C fl, /(x) < /(x) (/(x) > /(x)). As 
with the one-variable case if 17 has boundary points / may have a global 
maximum/minimum on the boundary. 

Necessary conditions for a smooth function of two independent variables 
to have local extrema can be derived from considerations similar to those used 
in the single- variable case. Suppose that / : fl — > M is a smooth function on 
the region fl C M 2 , and that / has a local extremum at x = {xi,x 2 ) G fl- 
Then there exists an e > 0 such that /(x) — /(x) does not change sign for all 
x G 5(x; e). Let x = x + erj, where rj = ( 771 , 772 ) G R 2 . For e small, Taylor’s 
theorem implies 



/(*) 



/(x) + e 




+ 0(e 3 ), 



9/(x) , ^ 3/(x) 



771 fer 
^ 2 /(x) 
9x 2 



?72- 



2?7i?72 



dx 2 

^ 2 /(x) 

dx\dx 2 



V 2 



d 2 /(x) 1 
9x 2 / 



and the sign of /(x) — /(x) is given by the linear term in the Taylor expansion, 
unless this term is zero. But, if x + e ?7 G -B(x; e), then x — a) G f?(x; e) and 
these points yield different signs for the linear term unless it is zero. If x is a 
local extremum we must therefore have that 

<*■*><.£>- 0 . <“> 

for all 77 G R 2 . In particular, equation (2.2) must hold for the special choices 
ei = (1,0) and e 2 = (0,1). The former choice implies that df/dx\ = 0 and 
the latter choice implies that d f/dx 2 = 0. We thus have that if / has a local 
extremum at x then 

V/(x) = 0. (2.3) 

Geometrically, equation (2.2) implies that the tangent plane to the graph of 
/ is horizontal at a local extremum. Points x at which V/(x) = 0 are called 
stationary points. If x is a stationary point and x = x+e? 7 , then /(x) — /(x) 
is 0(e 2 ) as e —7 0, in contrast to the generic case where an O(e) change in the 
independent variables produces an O(e) change in the difference. 




2.1 The Finite-Dimensional Case 



27 



Example 2.1.5: Let f(x 1 , 2 : 2 ) = xf — x\ + x\. The stationary points for 

/ are given by V/(x 1 , 2 : 2 ) = (2xx + 3xf,— 2x 2 ) = 0. This equation has two 
solutions (0,0) and (—2/3,0). It can be shown that (0,0) produces neither a 
local minimum nor a local maximum for / (it is a saddle point). In contrast, 
at (—2/3,0) it can be shown that / has a local maximum. 



Example 2.1.6: The monkey saddle 1 is a surface described by f(x 1 , 2 : 2 ) = 

X 2 — 3xfx2 ■ If x is a stationary point for / then the equations 



—6x10:2 = 0, 

3x\ — 3x 2 = 0, 

must be satisfied and this means that x\ = x 2 = 0. The function / does not 
have a local extremum at this point. Note that even the second derivatives at 
this point are zero. 



The extension of the above arguments to functions of n independent vari- 
ables is straightforward. Let / : 17 — > R be a smooth function on the region 
17 C R ra , and suppose that / has a local extremum at x £ 17. Then, for e > 0 
sufficiently small, the sign of /(x) — /(x) does not change for all x £ 13(x; e). 
Let x = x + er], where 77 = ( 771 , 772 , • ■ • , ? 7 «)• For e is sufficiently small, Taylor’s 
theorem implies 

/(x) = /(x) + e ?7 • V/(x) + 0(e 2 ), 

and the sign of /(x) — /(x) is determined by the linear term in the Taylor 
expansion, provided this term is not zero. But the linear term must be zero 
since x + e?y and x — erj are both in B(x; e); hence, 

77 • V/(x) = 0 (2.4) 

for all 77 £ R™. The special choices ei = (1, 0, . . . , 0), e 2 = (0, 1, . . . , 0), . . . , e n = 
( 0 , 0 ,..., 1 ) for 77 yield the n conditions df/dxk = 0 at x for k = 1 , 2 , . . . , n. 
In summary we have the following result. 

Theorem 2.1.2 Let f : f2 — > R be a smooth function on the region 17 C R". 
If f has a local extremum at a point x £ 17 then 

V/(x) = 0. (2.5) 



1 The graph of this surface near x = 0 has three valleys and three hills. A monkey 

requires a saddle with two depressions for its legs and a third for its tail. 




28 



2 The First Variation 



2.2 The Euler-Lagrange Equation 

Local extrema for a functional can be defined in a manner analogous to that 
used for functions of n variables. The transition from finite to infinite dimen- 
sional domains, however, carries with it some complications. For instance, 
there may be several vector spaces for which the problem is well defined, and 
once a function space is chosen, there may be several suitable norms avail- 
able. The vector space C n [xQ,X\\, for example, can be equipped with any of 
the II ' ||fc,oo norms, k = 1, 2, . . . , n or even any L p norm. 2 Unlike the finite- 
dimensional case, different norms need not be equivalent and thus may lead to 
different extrema. Functions “close” in one norm need not be close in another 
norm. In applications, the choice of a vector space and norm form an integral 
part of the mathematical model. 

Let J : X — > R. be a functional defined on the function space (X, || • ||) 
and let SCI. The functional J is said to have a local maximum in S at 
y £ S if there exists an e > 0 such that J(y) — J(y) <0 for all y £ S such 
that || y — y|| < e. The functional J is said to have a local minimum in S at 
y £ S if y is a local maximum in S for — J. In this chapter, the set S' is a set 
of functions satisfying certain boundary conditions. 

Functions y £ S in an e-neighbourhood of a function y £ S can be repre- 
sented in a convenient way as a perturbation of y. Specifically, if y £ S and 
||y — y\\ < e, then there is some tj £ X such that 

y = y + « 7 - 

All the functions in an e-neighbourhood of y can be generated from a suitable 
set H e of functions 77 . Certainly any such ?/ must be an element of X, but 77 
must also be such that y + er] £ S. The set H e is thus defined by 

H e = {?/ £ X : y + e ?7 £ S and ||? 7 || < 1}. 

Since the inequalities defining the extrema must be valid when e is replaced 
by any number e such that 0 < e < e, it is clear that e can always be made 
arbitrarily small when convenient. The auxiliary set H e can thus be replaced 
by the set 

H = {77 £ X : y + e ?7 G S}, 
for the purposes of analysis. 

At this stage we specialize to a particular class of problem called the fixed 
endpoint variational problem, 3 and work with the vector space C 2 [j:q . x 1 1 
that consists of functions on [xq,x\] that have continuous second derivatives. 
Let J : C 2 [a;o, X\] — > M be a functional of the form 

2 See Appendix B.l. 

3 More accurately, it is called the nonparametric fixed endpoint problem in the 
plane. 




2.2 The Euler-Lagrange Equation 



29 




where / is a function assumed to have at least second-order continuous partial 
derivatives with respect to x,y, and y' . Given two values 7/0, 2/i € R, the 
fixed endpoint variational problem consists of determining the functions y £ 
C 2 [x o,xi\ such that y(x o) = yo , y(x i) = y\, and J has a local extremum in S 
at y £ S. Here, 

S = {y £ C 2 [x 0 , X\] : y(x 0 ) = y 0 and y(x i) = j/i}, 

and 

H = {?? G C 2 [x 0 ,a:i] : 77(^0) = 77(^1) = 0 } 

(cf. figure 2 . 1 ). 

Suppose that J has a local extremum in S at y. For definiteness, let us 
assume that J has a local maximum at y. Then there is an e > 0 such that 
J(y) — J{y) < 0 for all y £ S such that ||y — y\\ < e. For any y £ S there is an 
77 £ H such that y = y + e?7, and for e small Taylor’s theorem implies that 

fix, y, y') = f{x, y + eq, y' + erf) 

= f(x , y, I /) + e 1 77^ + rj'^i | + 0 (e 2 ). 

Here, we regard / as a function of the three independent variables x, y, and 
7/, and the partial derivatives in the above expression are all evaluated at the 
point (x,y,y'). Now, 




30 



2 The First Variation 



rxi rx! 

J(y)-J(y)= f(x,y,y')dx- f(x,y,y')dx 

Jx 0 j Xq 

= I j(Vo,2b2/') + £ + ?/|^j + °( e2 )) -/(*>?/> 2/')} ^ 

=< r(’ , i +, 'w) <ii+o(e2) 

= e8J(v,y) + 0{e 2 ). 

The quantity 

U( ' hy)= l,«(’ 1 fy + ' i %) dx 

is called the first variation of J . Evidently, if y £ H then —y £ H , and 
6J(y,y) = — SJ(—rj,y ). For e small, the sign of J(y) — J(y) is determined by 
the sign of the first variation, unless SJ(y, y) = 0 for all y £ H . The condition 
that J(y) be a local maximum in S, however, requires that J{y) — J(y) does 
not change sign for any y £ S such that \\y — y\\ < e; consequently, if J(y) is 
a local maximum then 

s j (v, y) = f x ^ dx = °< ( 2 - 6 ) 

for all i) £ H . A similar chain of arguments can be used to show that equation 
(2.6) must be satisfied for all y £ H if J has a local minimum in S at y. 

So far we have shown that if J has a local extremum in S at y then equation 
(2.6) must be satisfied for all y £ H. As in the finite-dimensional case, the 
converse is not true: satisfaction of equation (2.6) does not necessarily mean 
that y produces a local extremum for J. If y satisfies equation (2.6) for all 
y £ H , we say that J is stationary at y, and following common convention, y 
is called an extremal for J even though it may not produce a local extremum 
for J. 

Equation (2.6) is the infinite-dimensional analogue of the equation (2.5). 
Recall that the condition V/ = 0 is derived from the fact that y • V/ = 0 
must hold for all y £ R". By a suitable choice of vectors in K™ it was shown 
that each component of V/ must vanish separately. A similar strategy can 
be used to divorce the necessary condition (2.6) from the arbitrary function 
y. It is not yet clear, however, which special choices of functions in H will 
accomplish this. Moreover the integrand in equation (2.6) contains not only 
y but also y' to complicate matters. 

The y' term in equation (2.6) can be eliminated using integration by parts. 

In detail, 




, df df 

v w dx=ri w 



r ± (<n\ 

Jx o v dx \ d y'J 




d 

V dx 




dx, 



dx 




2.2 The Euler-Lagrange Equation 



31 



where we have used the conditions r](x o) = 0 and 77 ( 2 : 1 ) = 0. Equation (2.6) 
can thus be written 




_d_ 

dx 




dx = 0 . 



(2.7) 



Now, 

df_ _ d_ (df_\ = df_ _&f_ _ &f_ , _ d 2 f „ 

dy dx \dy ' ) dy dxdy' dydy'^ dy'dy'^ 

and given that / has at least two continuous derivatives, we see that for any 
fixed y £ C 2 [xo,Xi] the function E : [xo,Xi] — > R. defined by 



E(x) 



3/ _ (df\ 

dy dx \ dy' ) 



is continuous on the interval [xo,Xi]. Here, for a given function y the partial 
derivatives defining E are evaluated at the point (x, y(x),y'(x)). In fact, E can 
be regarded as an element in the Hilbert space L 2 [xo, X1] 4 and since any 77 £ H 
is also in L 2 [xo, X\] we can draw a closer analogy with the finite-dimensional 
case by noting that equation (2.7) is equivalent to the inner product condition 



(77 ,E)= / rj(x)E(x) dx = 0 (2-8) 

j Xo 

for all 77 £ H . As with the finite-dimensional case, we can show that the 
above condition leads to E = 0 by considering a special subset of H. First we 
establish two technical results. 



Lemma 2.2.1 Let a and (3 be two real numbers such that a < j3. Then there 
exists a function v £ C 2 (R) such that v(x ) > 0 for all x £ (a, 0) and v(x) = 0 
for allx£R-(a,/3). 

Proof: Let 

, , _ f (x — a) 3 (0 — x) 3 , if x € (a, 0) 

1/ 2 ( 0 , otherwise. 

The function v clearly has all the properties claimed in the lemma except 
perhaps continuous derivatives at x = a and x = 0. Now, 



lim 

X — ►£* + 



v(x) — v(a) 
x — a 




x — 



lim (x 

X — ►0! + 



- a) 3 (0 — x) 3 — 0 
x — a 

a) 2 {0 — x) 3 = 0, 



4 Hilbert spaces are discussed in Appendix B.2. Any function continuous on the 
interval [xo,xi] is in this space. There are a lot “rougher” functions in this space 
as well. 




32 



2 The First Variation 



lim 

x — >a~ 



v(x) — v(a) 
x — a 



lim 

X — >Oi~ 



0-0 



x — a 



= 0, 



so that v' = 0. Similarly, 

v'{x) — v'(a) 3(x — a) 2 (f3 — x) 2 (/3 + a — 2x) — 0 

lim = lim 

x— >a+ X — Oi x— >a+ X — Q 

= lim 3(x — a)(/3 — x) 2 (/3 + a — 2x) = 0, 

x— >-Q:+ 



and 



lim 

X — >Oi~ 



v'(x) — v'(a) 
x — a 



lim 

X — >Ot~ 



0-0 



x — a 



= 0, 



so that v"(a) = 0. Similar arguments can be used to show that z/'(/3) = 0. 
The second derivative is thus 



! 6(x — a)(/3 — x) {(x — a) 2 + (f3 — x ) 2 

— 3(a: — a)((3 — a;)} , if x € (a, (3) 

0, otherwise, 



and it is clear that 



and 



hence, v £ C 2 (R). 



lim v"{x) = v" [a) = 0 

X — >Oi 

lim v"{x) = = 0; 



□ 



Lemma 2.2.2 Suppose that (ipg) = 0 for all rj £ H . If g : [xo,a:i] — » K is a 
continuous function then g = 0 on the interval [xo,a:i]. 

Proof: Suppose that g ^ 0 for some c £ [xo, Xi]. Without loss of generality it 
can be assumed that g(c) > 0, and by continuity that c £ (xo,Xi). Since g is 
continuous on [xo,Xi] there are numbers a,/3 such that xo<a<c<(3<x\ 
and g(x) > 0 for x £ (a,0). Lemma 2.2.1 implies that there exists a function 
v £ C 2 [xo,Xi] such that v(x) > 0 for all x £ (a,/3) and v(x) = 0 for all 
x £ [xo,Xi] — (a, /3). Therefore, v £ H and 

t-X 1 l-p 

(v,g)= / v(x)g(x)dx= / v(x)g(x) dx > 0, 

J X 0 J OL 

which contradicts the assumption that ( 77 , g) = 0 for all rj £ H. Thus g = 0 
on (xo,Xi) and by continuity g = 0 on [xo,Xi]. □ 

The above result indicates that if y is an extremal, then E = 0 for all 
x £ [xq,Xi]. Formally, this result is summarized in the next theorem. 




2.2 The Euler-Lagrange Equation 

Theorem 2.2.3 Let J : C 2 [ xq,x±] — > R. be a functional of the form 



33 



fXl 

J (y)= f(x,y,y')dx, 

J Xr) 



where f has continuous partial derivatives of second order with respect to x,y, 
and y' , and Xo < X\. Let 

S = {y £ C 2 [xo, x\] : y(x 0 ) = yo and y(x 1 ) = y 1 }, 
where yo and y\ are given real numbers. If y £ S is an extremal for J, then 



d 

dx \dy’ 



d f\_9L = n 

dy 



(2.9) 



for all x £ [ xq , x\\. 

Equation (2.9) is a second-order (generally nonlinear) ordinary differential 
equation that any (smooth) extremal y must satisfy. This differential equation 
is called the Euler-Lagrange equation. The boundary values associated 
with this equation for the fixed endpoint problem are 

y(xo) = yo, y{x\) = yi- (2.10) 



The Euler-Lagrange equation is the infinite-dimensional analogue of the 
equation (2.5). In the transition from finite to infinite dimensions, an algebraic 
condition for the determination of points x £ K" which might lead to local 
extrema is replaced by a boundary-value problem involving a second-order 
differential equation. 



Example 2.2.1: Geodesics in the Plane 

Let ( x 0 ,yo ) = (0,0) & n d (aq,yi) = (1,1). The arclength of a curve described 
by y(x ), x £ [0, 1] is given by 

J(y) = [ \/l + y' 2 dx. 

Jo 



The geodesic problem in the plane entails determining the function y such 
that the arclength is minimum. We limit our investigation to functions in 
C' 2 [0, 1] such that 

2 /( 0 ) = 0 , 3 ,( 1 ) = 1 . 



If y is an extremal for J then the Euler-Lagrange equation must be satisfied; 
hence, 



d 

dx 




lf=f[-r J =)-0 = 0 

dy dx y^/i + y' 2 j 



i.e., 




34 



2 The First Variation 



— . = const. 

+ y ' 2 

The last equation is equivalent to the condition that y' = ci, where ci is a 
constant. Consequently, an extremal for J must be of the form 

■y(x) = cix + c 2 , 

where C 2 is another constant of integration. Since y( 0) = 0, we see that C 2 = 0, 
and since y(l) = 1, we see that Ci = 1. Thus, the only extremal y is given by 
■y(x) = x, which describes the line segment from (0,0) to (1,1) in the plane 
(as expected). We have not shown that this extremal is in fact a minimum. 
(This is shown in Example 10.7.1.) 



Example 2.2.2: Let (xo,yo) = (0,0), (£ 1 , 2 / 1 ) = (1,1), and consider the 

functional defined by 

J{y) = [ ( y 12 - y 2 + 2 xy) d lx. 

Jo 

The Euler-Lagrange equation for this functional is 

- 7 ^( 22 /') - (~ 2 y + 2 x) = 0 ; 



The homogeneous solution is yh{x) = CiCOs(a;) + C 2 sin(;r), where ci and C 2 are 
constants, and the particular solution is y p (x) = x. The general solution to 
the Euler-Lagrange equation is thus given by 



y(x) = Cicos(x) + C2sin(a:) + x. 



The condition ?/(0) = 0 implies that Ci = 0, and the condition (/(l) = 1 implies 
that C 2 = —1/ sin(l). The only extremal for this functional is thus given by 



y{x) = x - 



sin(a:) 
sin(l) ' 



Example 2.2.3: Let k denote some positive constant and let J be the 

functional defined by 

J(y)= [ ( y' 2 -ky 2 )dx , 

Jo 

with endpoint conditions t/(0) = 0 and y(ir) = 0. If y is an extremal for J then 




2.2 The Euler-Lagrange Equation 



35 



d_ 

dx 



(2 y') + 2 ky = 0; 



i.e.. 



y" + ky = 0. 



The general solution to the Euler-Lagrange equation is 



y( x) = c\Cos{\fkx ) + C 2 sin(v / fcx). 



Now y(0) = 0 implies that Ci = 0, and y(n) = 0 implies that C2sin(v / £:7r) = 0. 
If Vk is not an integer, then C 2 = 0, and the only extremal is y = 0. If \/k is 
an integer, then sin(v / A:7r) = 0 and C 2 can be any number. In the latter case 
we have an infinite number of extremals of the form y[x) = C 2 sin(-\/fca;). 



Exercises 2.2: 

1. Alternative Proof of Condition (2.6): Let y £ S and ?y £ H be fixed 
functions. Then the quantity J(y + ey) can be regarded as a function of 
the single real variable e. Show that the equation dJ/de = 0 at e = 0 leads 
to condition (2.6) under the same hypotheses for /. 

2. The First Variation: Let ./ : S i? and K : S —> 17, be functionals 
defined by 

J(y)= f{x,y,y')dx, K(y)= g(x,y,y')dx, 

J Xq j Xq 

where / and g are smooth functions of the indicated arguments and 17 C 

R. 

(a) Show that for any real numbers A and B , 

5(AJ + BK)(y, y) = A6J(r , , y) + BSK(y , y) (2.11) 

(i.e., S is a linear operator), and 

6(JK)(y,y) = K(y,y)SJ(y,y) + J(y,y)SK(y,y) (2.12) 

(a product rule). 

(b) Suppose that G : Q x 17 — ■> K is differentiable on 17 x 17. Show that 

BC BG 

SG(J, I\) (?y, y) = —5J(y,y) + —6K(rj,y) (2.13) 

(a “chain rule” for the <5 operator). 

3. Let n be any positive integer. Extend Lemma 2.2.1 by showing that there 
exists a v G C ra (R) such that v(x) > 0 for all x € (a, (3) and v = 0 for all 
x £ R \ (a, (3). 




36 



2 The First Variation 



4. Let J be the functional defined by 

J(y)= [ ( y ' 2 + y 2 + 4 ye x ) dx, 

Jo 

with boundary conditions y{ 0) = 0 and y{ 1) = 1. Find the extremal(s) in 
C 2 [ 0, 1] for J. 

5. Consider the functional defined by 

J(y) = J x 4 y' 2 dx. 

(a) Show that no extremals in C 2 [— 1, 1] exist which satisfy the boundary 
conditions y(— 1) = — 1, y( 1) = 1. 

(b) Without resorting to the Euler-Lagrange equation, prove that J can- 
not have a local minimum in the set 

S = {y G C 2 [-l,l] : y(- 1) = -1 and y( 1) = 1}. 



2.3 Some Special Cases 

The Euler-Lagrange equation is a second-order nonlinear differential equation, 
and such equations are usually difficult to simplify let alone solve. There 
are, however, certain cases when this differential equation can be simplified. 
We examine two such cases in this section. We suppose throughout that the 
functional satisfies the conditions of Theorem 2.2.3. 



2.3.1 Case I: No Explicit y Dependence 



Suppose that the functional is of the form 



J(y)= f(x,y')dx , 

J Xq 

where the variable y does not appear explicitly in the integrand. Evidently, 
the Euler-Lagrange equation reduces to 




(2.14) 



where c\ is a constant of integration. Now df/dy 1 is a known function of x 
and y', so that equation (2.14) is a first-order differential equation for y. In 
principle, equation (2.14) is solvable for y ’ , provided d 2 f/dy' 2 0, 5 so that 
equation (2.14) could be recast in the form 

5 One can invoke a variant of the implicit function theorem (Appendix A. 2). 




2.3 Some Special Cases 



37 



y' = 9(x,ci), 



for some function g and then integrated. In practice, however, solving equation 
(2.14) for y' can prove formidable if not impossible, and there may be several 
solutions available. Nonetheless, the absence of y in the integrand simplifies the 
problem of solving a second-order differential equation to solving an implicit 
equation and quadratures. 

Example 2.3.1 : Let 



j(y) = ( e x y/l + y' 2 dx. 

J xo 

The Euler-Lagrange equation for this functional leads to the equation 



df = e x y' 
dy' + y'l 



(2.15) 



where c\ is a constant of integration. Note that \y' I \J\ + y' 2 \ < 1 so that 
| ci | < e x °. Equation (2.15) can be solved for y' to get 



V = 



Cl 



s/ c 2x - ■ 



and integrating this expression with respect to x yields 



y(x) = sec 




+ C2> 



where C 2 is another constant. 



Example 2.3.2: Geodesics on a Sphere 

In Example 1.4.1, let u = 6 and v = <f>. Suppose that we choose t = u, so that 
we regard <f> as a function of 0. The arclength functional for the sphere is then 




1 + <// 2 sin 2 8 d9 , 



(2.16) 



where (j)' denotes d<f>/d9. The integrand does not contain (f> explicitly, and 
therefore the Euler-Lagrange equation gives 



4>' sin 2 9 
\/l + 4>' 2 sin 2 9 



= Cl, 



(2.17) 



where ci is a constant. Now, ^ ,2 sin 4 9 < <f>' 2 sin 2 9 < l + ()) ,2 sin 2 9, and therefore 
— 1 < Ci < 1. Hence, we can replace Ci by the constant sin a. Equation (2.17) 
implies 




38 



2 The First Variation 



sin a 



thus, 



sin 9 \J sin 2 9 — sin 2 a 

df, + /3, 



sm a 



Je 0 sin £-\J sin 2 f — sin 2 a 
where (3 = 4>{9q). The above equation yields the implicit relation 

tan a 



cos (</> + f3) = 



tan 9 



or in Cartesian coordinates, 

x cos (3 — y sin /3 — z tan a. 



(2.18) 

(2.19) 



Equation (2.19) is the equation of a plane through the centre of the sphere. 
The geodesic corresponds to the intersection of this plane with the sphere; 
hence, it must be an arc of great circle. 



2.3.2 Case II: No Explicit x Dependence 



Another simplification is available when the integrand does not contain the 
independent variable x explicitly. 

Theorem 2.3.1 Let J be a functional of the form 

rxi 

J{y)= / f(y,y')dx, 

J x 0 

and define the function H by 

H ^Ly') = y'^i- f- 

Then H is constant along any extremal y. 

Proof: Suppose that y is an extremal for J. Now, 



d it, o d ( ,df 

dx H{y ’ v) = dx{ y W~ f 



= y 



= y 



dx 

d f 



/ d df 
dy' ^ ^ dx dy' 
d_df__dj_ 
dx dy’ dy 



,df 



df 



dy dy 1 



and since y is an extremal, the Euler-Lagrange equation (2.9) is satisfied; 
hence, 




2.3 Some Special Cases 



39 



±H(y,y') = 0. 
ax 

Consequently, H must be constant along an extremal. □ 

Note that the function H depends only on y and y ' , and thus the equation 

H(y,y') = const. (2.20) 



is a first - order differential equation for the extremal y. 

Example 2.3.3: Catenary 

The catenary problem (Section 1.2) has a functional of the form 



J(y) = [ y\/l + y' 2 dx. 

Jx 0 



The above integrand does not contain x explicitly and therefore 

H(y, y') = v'^-f 



= y 



yy 






- y^/l + y' 2 



is constant along an extremal. Any extremal y must consequently satisfy the 
first-order differential equation 



1 + y 



.12 



= Cl, 



( 2 . 21 ) 



where ci is a constant. If Ci = 0, then the only solution to equation (2.21) is 
y = 0. Suppose that Ci yf 0; then equation (2.21) can be replaced by 



y 



/ 




We integrate equation (2.22) for a: as a function of y , viz., 

/' dy 



= ciln 



K - 1 

<=1 

y + vV - c l 

Cl 



c 2 , 



( 2 . 22 ) 



where c 2 is a constant of integration. Now, 




40 



2 The First Variation 



and 



therefore, 



r ,p-( x ~ c 2)/ Cl _ 1 . 

y + Vv 2 - c i’ 



Cl 



^g(x-c 2 )/ci _|_ g-(*-c 2 )/ci^ 



2 / + 

2y. 




c 



2 

1 



2 / + vV - cf 



The extremals are thus given by 



2 / 0 ) 



ci cosh ( 



x - c 2 

Cl 



)■ 



Example 2.3.4: Brachystochrone 

The brachystochrone problem (Section 1.2) has a functional of the form 




dx. 



The integrand does not depend on x explicitly; thus, 

H M ) = \ 



\/ 2/(1 + y ' 2 ) 

1 

\/ 2/(1 + 2 /' 2 ) 




is constant along an extremal. If y is an extremal for J then it must satisfy 
the first-order differential equation 

2/(1 + y' 2 ) = Ci, (2.23) 



where ci is a constant. Equation (2.23) can be solved parametrically. Let 
y' — tan ip; then 1 + y' 2 = sec 2 ip and 

y = — %— = ci cos 2 ip = «i(l + cos(2^>)), (2.24) 

sec z ip 

where «q = c\/2. Now, 



dy = — 4/sqcos ip sin ip dip 




2.3 Some Special Cases 



41 



and 



dx = cot tpdy = — 4k i cos 2 ip dip 
= — 2«i(l + cos(2^>)) dtp. 



Therefore, 

a; = «2 — Ki(2tp + sin(2'i/;)), (2.25) 

where k 2 is an integration constant. Equations (2.24) and (2.25) provide a 
parametric solution to the problem. The solution curve is a well-known class 
of plane curves called cycloids (Section 1.2). 

The simplification when / does not depend on y explicitly is more or less 
obvious from the Euler-Lagrange equation; the simplification when x is absent 
in / is less obvious. In particular, what leads one to consider a function such 
as H in the first place? Equation (2.20) is an example of a conservation 
law: along any extremal, the quantity H is conserved. In problems concern- 
ing classical mechanics, H often represents the total energy of the system. 
One can thus be led to consider a function such as H from the physics of 
whatever the functional is modelling if a conservation law is known. Mathe- 
matically, this approach is not very satisfactory. One immediately questions 
whether other conservation laws exist and if there are any other special cases 
for the integrand leading to conservation laws. In fact, there are ways to de- 
duce conservation laws mathematically. Noether’s theorem provides a general 
framework in which to derive conservation laws. We discuss this theorem in 
Chapter 9. 

Exercises 2.3: 

1. Find the general solution to the Euler-Lagrange equation corresponding 
to the functional 

J(y)= [ f(x)y/l + y' 2 dx, 

j Xo 

where xq > 0, and investigate the special cases: (i) f(x) = y/x, (ii) f(x) = 

x. 

2. Find the extremals for the functional defined by 

x 6 

where xq > 0. 

3. Let 

J (y) = y 2 {i-y'fdx. 

Find a smooth extremal for J satisfying the boundary conditions y{ 2) = 1 
and j/(3) = \/3. 





42 



2 The First Variation 



2.4 A Degenerate Case 

In the examples so far, the integrand of the functional depends on y' in some 
nonlinear way. If the integrand is linear in y' , the problem becomes degenerate 
in a sense that is explained in this section. 

Suppose that J is a functional of the form 

rxi 

J(y)= {A{x,y)y' + B(x,y)) dx , 

J x 0 

where A and B are smooth functions of x and y. The Euler-Lagrange equation 
for this functional is 



d 

dx 



A(x,y) 



,dA 
' dy 




= 0 . 



But 



d dA . dA 

dx A{z ’ v)= ai + ,) W 



so that the Euler-Lagrange equation reduces to 



d A dB 
dx dy 



(2.26) 



Note that equation (2.26) is not even a differential equation for y: it is an 
implicit equation for y that may or may not have solutions depending on 
the given functions A and B. Moreover, equation (2.26) contains no arbitrary 
constants so that arbitrary boundary conditions cannot be imposed on any 
solutions. 

It may be that equation (2.26) is satisfied for all x and y: i.e., A x = B y is 
an identity. In this case equation (2.26) places no restriction on y, but it does 
imply the existence of a function <f>(x, y) such that (j> y = A and (p x = B. In 
this case the integrand can be written as 

f _ d<t> , ,d<j> _ # 

ox oy dx 

that is, f dx = d(j) (an exact differential). 6 Consequently, 



J(y)= / d( t>= <l>(xuy(xi)) -<l>(xo,y(xo)), 

J Xq 

so that J depends only on <fi and the endpoints (xo, y{x o)) and (aq, y{x-\)). The 
value of J is thus independent of y, so that the integral is path independent. 

Equation (2.26) is a well-known integrability condition (cf. [44], p. 529). 



6 




2.4 A Degenerate Case 



43 



Example 2.4.1 : Let f(x, y, y') = ( x 2 + 3 y 2 )y' + 2 xy. Here, A x = 2x = B y , 

so that the value of the functional defined by 

rx i 

J(y)= / ({x 2 + 3 y 2 )y' + 2 xy) dx 

J Xq 

is independent of the choice of y. A function cf> can be found by integrating 
the equations B = (f> x and A = (f> y . For example 4> x = B = 2 xy; hence, 

^ = x 2 y + C{y), 

where C is some function of y to be determined. Now 
4, y =x 2 + C'(y) = A = x 2 + 3 y 2 , 



and so 



(/) = x 2 y + y 3 + k, 

where k is an arbitrary constant. Thus, 



J{y) = (/>{x 1 ,y(x 1 )) - <j>(x 0 ,y(x 0 )) 

= x\y\ + yf — (xq?/o + Vo)- 

(Note that the arbitrary constant k vanishes from the final answer.) 



In summary, variational problems with integrands of the form A{x , y)y' + 
B(x, y) are degenerate in that either y is determined implicitly and can satisfy 
only very special sets of boundary data, or the value of the corresponding func- 
tional does not depend on the choice of y. In the latter case the determination 
of local extrema is vacuous. 

An immediate concern is that there may be other forms of integrands that 
lead to path independent functionals. These functionals are characterized by 
the property that the Euler-Lagrange equation reduces to an identity valid 
for all x and y in the space under consideration. The next theorem shows that 
in fact the integrand must be linear in y' for such an identity to be valid. 

Theorem 2.4.1 Suppose that the functional J satisfies the conditions of The- 
orem 2.2.3 and that the Euler-Lagrange equation (2.9) reduces to an identity. 
Then, the integrand must be linear in y' , and the value of the functional is 
independent of y. 

Proof: If the Euler-Lagrange equation is an identity, then 



dj_ _ _&l_ _ „ = 

dy dxdy' dydy' ^ dy' 2 



(2.27) 



for all x € [xo,Xi] and y £ S. Now, y" appears only in the last term on the 
left-hand side of the equation, and since equation (2.27) must hold for all 




44 



2 The First Variation 



y £ S we must have that d 2 f/dy' 2 = 0. The integrand must therefore be of 
the form 



f(x, y, y') = A(x, y)y' + B(x, y), 



where A x = B y for all x € [xq. x{\ and y £ S. 



□ 



2.5 Invariance of the Euler-Lagrange Equation 



The principles in physics that lead to variational formulations do not depend 
on coordinate systems. Geometrical problems such as the determination of 
geodesics are likewise “coordinate free” in character. The path of a particle, 
for instance, does not depend on the coordinate system the observer uses to 
describe it; a geodesic does not depend on a particular parametrization of 
the surface. These types of problems can be framed in terms of maximizing 
functionals and ultimately lead to solutions to an Euler-Lagrange equation. 
On physical (and geometrical) grounds one thus expects the Euler-Lagrange 
equation to also be invariant with respect to coordinate transformations. In 
this section we take an informal but practical look at the invariance of the 
Euler-Lagrange equation. 

A coordinate transformation 



x = x(u,v), y = y{u,v), 



(2.28) 



is called smooth if the functions x and y have continuous partial derivatives 
with respect to u and v. A smooth transformation is called nonsingular if 
the Jacobian 



1/) i . ( x u y u \ 

d(y^vj~ \x v y v ) 



satisfies the condition 



pvl 

o(u, V) 



(2.29) 



Here we use the notation x u = dx/du etc. for succinctness. Note that condi- 
tion (2.29) implies that the transformation is invertible: to every pair (x,y) 
there corresponds a unique pair (u,v) satisfying equation (2. 28). 7 We assume 
that the coordinate transformation defined by equation (2.28) is smooth and 
nonsingular. 

Let J be a functional of the form 



rxi 

J(y)= f(x,y,y')dx, 

J Xq 



(2.30) 



and let S be defined by 

' This result follows from the implicit function theorem; see Theorem A. 2. 2. 




2.5 Invariance of the Euler-Lagrange Equation 



45 



s = {y e C 2 [xq, aq] : y(x 0 ) = y 0 and y(x i) = yi}, 



where j/o and yi are given numbers. Suppose now that we write the functional 
in terms of the (u, v ) coordinates and, for definiteness, let us regard v as a 
function of u. Then, 

dy _ dy/du _ y u + y v v 
dx dx/du Xu + Xyi) 1 



and 



dx = — du = ( x u + x v v) du, 
dii 



where v denotes dv/du. The functional defined by equation (2.30) thus be- 
comes 



J(y) = / f{x,y,y')dx 



' Xq 
rui 



>Uq 

PUi 



f{x(u, v),y(u, v), Vu + yvV )(x u + x v v) du 
x u + x v v 

F(u, v, v) du. 



Here, the numbers Uo,Ui and the new boundary values v(uq) = Vo, v(ui) = V\ 
are the unique solutions to the equations 



x 0 = x(u 0 ,v 0 ), X! = x(u 1 ,v 1 ), 

yo = y(uo,v 0 ), yi = y(ui,wi). 



For clarity, let 

pu 1 

K{v) = / F(u,v,v)du, (2-31) 

Ju 0 

and let T be the set defined by 

T = {v £ C 2 [uo,u\] : v(uq) = Vo and v(u\) = iq}. 

Given a curve in the a;y-plane described by a function y = y(x), the trans- 
formation (2.28) defines the curve in the ttu-plane described by some function 
v = v(u). The essence of the invariant question is: if v € T is an extremal for 
K , is y £ S and extremal for J (and vice versa)? The next theorem resolves 
this question. 

Theorem 2.5.1 Let y £ S and v £ T be two functions that satisfy the smooth 
nonsingular transformation (2.28). Then y is an extremal for J if and only if 
v is an extremal for K. 

Proof: Suppose that v £ T is an extremal for K. Then, v satisfies the Euler- 
Lagrange equation 

d_dF_ _ dF_ 
du dv dv 



(2.32) 




46 



2 The First Variation 



Now, 



so that 



F(u, v, v) = f(x(u, v),y(u, v), Vu + VvV . )(x u + x v v), 



&F_df 
dv dy 



x u + x v v 



d ( y u + VvV 



^ ^ J ( , • \ ^ 

tv" = -K~ l {x u + X v v)— . 

Hr " r>1 '' OV y Xy X v V 



T x v f , 



and 



d F 
dv 



df_ df_ df d f y u + y v v 

dx Xv dy^ v dy'dv\x u +x v v 



d 



( Xu + x v v) 



+ f—( Xu + X v v). 

A straightforward but tedious calculation shows that 

d dF _dF _ d(x, y) / d df _ df 
du dv dv d(u,v ) \dx dy' dy 



(2.33) 



Since the transformation is nonsingular, the Jacobian is nonzero; hence, if v 
is an extremal for K then equations (2.32) and (2.33) imply that 

±dl_dl =() 

dx dy’ dy 

so that y must be an extremal for J . Equation (2.33) also implies the converse. 

□ 



It is philosophically reassuring that the path of a particle is independent 
of the observer’s choice of coordinates. There is also a practical implication: 
coordinate transformations can be made in the functional before the Euler- 
Lagrange equation is formulated. An example suffices to illustrate the value 
of this observation. 

Example 2.5.1: Let J be the functional defined by 

J(y) = j \Jx 2 + y 2 \/l + y ' 2 dx. 

J Xq 

The integrand contains both x and y so that there are no conspicuous first 
integrals of the Euler-Lagrange equation 



d_ 

dx 




- y 




= 0. 



(2.34) 




2.5 Invariance of the Euler-Lagrange Equation 47 

On the other hand, the presence of the term \J x? + y' 2 suggests the use of 
polar coordinates. Let 



x = x((f>, r) = r cos <j>, 
y = y(<f i>, r) = r sin (f>. 



This transformation is evidently smooth, and since 

d(4>,r) \x r y r J 

, ( — r sind> r cos 6 

= det , . , 

\ cos a> sm 6 




the transformation is nonsingular, provided r / 0. Now, suppose that r is 
regarded as a function of </>, then 

/ y<j> + Vr'i' x cos <j) + sin <pr 

^ x<f, + x r f —r sin 4> + cos <f>f ’ 

so that 

sjl + y’ 2 dx = \J r 2 + f 2 dcj). 

The functional J thus becomes 

r4>i r<t , i 

K(r) = / r \/ r 2 + r 2 dcj) = / F{r,f)dcj). (2.35) 

J 4 > 0 J <fio 



The integrand does not depend on <f> explicitly, and therefore the correspond- 
ing Euler-Lagrange equation has a first integral 



dF 

F[(r, f ) = r— — F 
or 



Vr 2 + ' 






= const.: 



f = r\J c\r 4 — 1, (2.36) 

where ci is a nonzero constant. Equation (2.36) can be integrated to solve for 
(f> as a function of r, 

/ 



dr 

r\J c\r 4 — 1 



= — sm 
2 



Cir- 



— 9 + C-2 , 



where C 2 is a constant. Thus, for «q = 1/ci, and ^2 = ~2c2, the function r((f>) 
is given implicitly by 




48 



2 The First Variation 



— = s in(-20 + k 2 ) 

= — sin(2 <j>) cos k 2 + cos(2 <j>) sin n 2 
= — 2 sin </> cos ^ cos K 2 + (2 cos 2 cp — l) sin«; 2 . 

In terms of the original Cartesian coordinate system, the above expression is 
equivalent to 

Ki = a" 2 sinK 2 ~ 2xycosK 2 — y 2 sinn 2 . (2.37) 



Exercises 2.5: 



1. Change of Variable: Let ip : [fo,ti] be a smooth function on the 

interval [fo,U] such that ip'(t) > 0 for all t G [to,ti] and let ip (to) = Xo, 
ip(ti) = X\. Using the transformation x = ip(t), the functional J defined 

f(x,y,y')dx 

can be transformed to the functional K defined by 




K(Y) = / F(t,Y,Y)dt , 



’to 



where, for Y(t) = y(ip(t)), Y denotes dY/dt and 

F(t 1 Y 1 Y) = f(i,(t),Y,YW(t)- 

Prove by direct calculation that 

_d 9F _ aF _ f d_ df_ _ df\ 

dt dY dY \ dx dy' dy ) 

and hence that y is an extremal for J if and only if Y is an extremal for 
K. 

2. Let J be the functional defined by 




Find an extremal for J satisfying the boundary conditions r( 7r/2) = 1 and 
r( n) = — 1. 

3. Let J be a functional of the form 

J(y)= [ g(x 2 + y 2 )\/l + y l2 dx, 

J X 0 

where g is some function of x 2 + y 2 . Use the polar coordinate transfor- 
mation to find the general form of the extremals in terms of g, r, and 




